highlights
Collection
14 items • Updated
• 2
This is a Llama-3.3-8B-Instruct-128K fine-tune, produced through P-E-W's Heretic (v1.2.0) abliteration engine with Magnitude-Preserving Orthogonal Ablation enabled.
Note: Model exhibits overt non-compliance (divergence, changing focus, reinterpretation, and rarely argumentation). An effort was made to target model-unique refusals, overt non-compliance, and disclaimer/warning attachments.
Heretication Results
| Score Metric | Value | Parameter | Value |
|---|---|---|---|
| Refusals | 0/100 | direction_index | 11.17 |
| KL Divergence | 0.0448 | attn.o_proj.max_weight | 1.92 |
| Initial Refusals | 102/104 | attn.o_proj.max_weight_position | 6.82 |
| attn.o_proj.min_weight | 1.77 | ||
| attn.o_proj.min_weight_distance | 23.82 | ||
| mlp.down_proj.max_weight | 0.85 | ||
| mlp.down_proj.max_weight_position | 7.03 | ||
| mlp.down_proj.min_weight | 0.77 | ||
| mlp.down_proj.min_weight_distance | 28.52 |
Appendix
Empty system prompt.
Previous attempt: Click Here
Trial 196 was the optimal choice, picked 192. Additional trials can be run.
Restoring model from trial 196...
* Parameters:
* direction_index = 10.72
* attn.o_proj.max_weight = 1.87
* attn.o_proj.max_weight_position = 20.92
* attn.o_proj.min_weight = 1.76
* attn.o_proj.min_weight_distance = 16.32
* mlp.down_proj.max_weight = 0.78
* mlp.down_proj.max_weight_position = 6.49
* mlp.down_proj.min_weight = 0.54
* mlp.down_proj.min_weight_distance = 13.90
» [Trial 192] Refusals: 0/104, KL divergence: 0.0448
[Trial 199] Refusals: 2/104, KL divergence: 0.0398
[Trial 196] Refusals: 5/104, KL divergence: 0.0273
[Trial 141] Refusals: 21/104, KL divergence: 0.0207
[Trial 101] Refusals: 22/104, KL divergence: 0.0205
[Trial 205] Refusals: 37/104, KL divergence: 0.0132
[Trial 213] Refusals: 58/104, KL divergence: 0.0124
[Trial 131] Refusals: 72/104, KL divergence: 0.0088
[Trial 214] Refusals: 81/104, KL divergence: 0.0080
[Trial 52] Refusals: 83/104, KL divergence: 0.0065
[Trial 18] Refusals: 88/104, KL divergence: 0.0057
[Trial 332] Refusals: 92/104, KL divergence: 0.0057
[Trial 68] Refusals: 94/104, KL divergence: 0.0048
[Trial 37] Refusals: 98/104, KL divergence: 0.0043
[Trial 28] Refusals: 99/104, KL divergence: 0.0022
[Trial 313] Refusals: 100/104, KL divergence: 0.0020
[Trial 20] Refusals: 101/104, KL divergence: 0.0015
[Trial 178] Refusals: 102/104, KL divergence: 0.0004
Original allura-forge/Llama-3.3-8B-Instruct, Thanks!
Additional Fixes:
rope_scaling