Naphula commited on
Commit
8c4f85f
·
verified ·
1 Parent(s): b472142

Upload 12 files

Browse files
.gitattributes CHANGED
@@ -35,3 +35,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  gemma-2-9b-it.imatrix filter=lfs diff=lfs merge=lfs -text
37
  imatrix_unsloth.dat filter=lfs diff=lfs merge=lfs -text
 
 
 
 
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  gemma-2-9b-it.imatrix filter=lfs diff=lfs merge=lfs -text
37
  imatrix_unsloth.dat filter=lfs diff=lfs merge=lfs -text
38
+ Audits/Asmodeus_Live_Audit.png filter=lfs diff=lfs merge=lfs -text
39
+ Audits/Slimaki_Live_Audit.png filter=lfs diff=lfs merge=lfs -text
40
+ Audits/Unreleased_2501_Della_Live_Audit.png filter=lfs diff=lfs merge=lfs -text
Audits/Asmodeus_Audit.log ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --- DELLA AUDIT V2 START ---
2
+ Loading config: config.yaml
3
+ Base Model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
4
+ Donors: 17
5
+
6
+ Extracting BASE MODEL fingerprint...
7
+
8
+ Extracting DONOR fingerprints...
9
+
10
+ Computing Task Vector geometry...
11
+
12
+ ================================================================================
13
+ ID | Model Name
14
+ --------------------------------------------------------------------------------
15
+ #1 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
16
+ #2 | TheDrummer--Cydonia-24B-v4.3
17
+ #3 | ReadyArt--4.2.0-Broken-Tutu-24b
18
+ #4 | zerofata--MS3.2-PaintedFantasy-v2-24B
19
+ #5 | TheDrummer--Magidonia-24B-v4.3
20
+ #6 | TheDrummer--Precog-24B-v1
21
+ #7 | zerofata--MS3.2-PaintedFantasy-v3-24B
22
+ #8 | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
23
+ #9 | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
24
+ #10 | trashpanda-org--MS3.2-24B-Mullein-v2
25
+ #11 | LatitudeGames--Hearthfire-24B
26
+ #12 | TheDrummer--Cydonia-24B-v4.2.0
27
+ #13 | TheDrummer--Magidonia-24B-v4.2.0
28
+ #14 | ConicCat--Mistral-Small-3.2-AntiRep-24B
29
+ #15 | Undi95--MistralThinker-v1.1
30
+ #16 | CrucibleLab--M3.2-24B-Loki-V2
31
+ #17 | Darkhn--M3.2-24B-Animus-V7.1
32
+ ================================================================================
33
+
34
+ --- MAGNITUDE ANALYSIS & DATA POINTS ---
35
+ ID | Status | Delta Norm | Orig Size | Model Name
36
+ ----------------------------------------------------------------------------------------------------
37
+ #1 | OK | 0.0000 | 83886080 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
38
+ #2 | OK | 1.2955 | 83886080 | TheDrummer--Cydonia-24B-v4.3
39
+ #3 | HIGH MAG | 46.6745 | 83886080 | ReadyArt--4.2.0-Broken-Tutu-24b
40
+ #4 | OK | 0.0505 | 83886080 | zerofata--MS3.2-PaintedFantasy-v2-24B
41
+ #5 | OK | 4.5662 | 83886080 | TheDrummer--Magidonia-24B-v4.3
42
+ #6 | OK | 4.0883 | 83886080 | TheDrummer--Precog-24B-v1
43
+ #7 | OK | 4.8187 | 83886080 | zerofata--MS3.2-PaintedFantasy-v3-24B
44
+ #8 | OK | 1.9250 | 83886080 | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
45
+ #9 | HIGH MAG | 47.3140 | 83886080 | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
46
+ #10 | OK | 0.1586 | 83886080 | trashpanda-org--MS3.2-24B-Mullein-v2
47
+ #11 | OK | 1.9367 | 83886080 | LatitudeGames--Hearthfire-24B
48
+ #12 | OK | 1.0936 | 83886080 | TheDrummer--Cydonia-24B-v4.2.0
49
+ #13 | OK | 3.9147 | 83886080 | TheDrummer--Magidonia-24B-v4.2.0
50
+ #14 | OK | 0.0164 | 83886080 | ConicCat--Mistral-Small-3.2-AntiRep-24B
51
+ #15 | OK | 11.4846 | 83886080 | Undi95--MistralThinker-v1.1
52
+ #16 | OK | 3.1101 | 83886080 | CrucibleLab--M3.2-24B-Loki-V2
53
+ #17 | OK | 0.7205 | 83886080 | Darkhn--M3.2-24B-Animus-V7.1
54
+
55
+ Log saved to: della_scan.log
56
+ Displaying charts...
Audits/Asmodeus_Audit.png ADDED
Audits/Asmodeus_Live_Audit.png ADDED

Git LFS Details

  • SHA256: 3a657ab2bc2dcbedfb7c42460705989e25d03bc3e2692fb8be0d4f2596ef27c8
  • Pointer size: 131 Bytes
  • Size of remote file: 214 kB
Audits/Slimaki_Audit.log ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ --- DELLA AUDIT V2 START ---
2
+ Loading config: config.yaml
3
+ Base Model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
4
+ Donors: 7
5
+
6
+ Extracting BASE MODEL fingerprint...
7
+
8
+ Extracting DONOR fingerprints...
9
+
10
+ Computing Task Vector geometry...
11
+
12
+ ================================================================================
13
+ ID | Model Name
14
+ --------------------------------------------------------------------------------
15
+ #1 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
16
+ #2 | TheDrummer--Cydonia-24B-v4.3
17
+ #3 | ReadyArt--4.2.0-Broken-Tutu-24b
18
+ #4 | zerofata--MS3.2-PaintedFantasy-v2-24B
19
+ #5 | TheDrummer--Magidonia-24B-v4.3
20
+ #6 | TheDrummer--Precog-24B-v1
21
+ #7 | zerofata--MS3.2-PaintedFantasy-v3-24B
22
+ ================================================================================
23
+
24
+ --- MAGNITUDE ANALYSIS & DATA POINTS ---
25
+ ID | Status | Delta Norm | Orig Size | Model Name
26
+ ----------------------------------------------------------------------------------------------------
27
+ #1 | OK | 0.0000 | 83886080 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
28
+ #2 | OK | 1.2955 | 83886080 | TheDrummer--Cydonia-24B-v4.3
29
+ #3 | HIGH MAG | 46.6745 | 83886080 | ReadyArt--4.2.0-Broken-Tutu-24b
30
+ #4 | OK | 0.0505 | 83886080 | zerofata--MS3.2-PaintedFantasy-v2-24B
31
+ #5 | OK | 4.5662 | 83886080 | TheDrummer--Magidonia-24B-v4.3
32
+ #6 | OK | 4.0883 | 83886080 | TheDrummer--Precog-24B-v1
33
+ #7 | OK | 4.8187 | 83886080 | zerofata--MS3.2-PaintedFantasy-v3-24B
34
+
35
+ Log saved to: della_scan.log
36
+ Displaying charts...
Audits/Slimaki_Audit.png ADDED
Audits/Slimaki_Live_Audit.png ADDED

Git LFS Details

  • SHA256: b448bff5aad46c8af9f9dcc8590b1b98bfaaf1c53d97fdf1685aef0a72d0ee51
  • Pointer size: 131 Bytes
  • Size of remote file: 120 kB
Audits/Unreleased_2501_Della_Live_Audit.png ADDED

Git LFS Details

  • SHA256: 44b089a762e7782ebb3833568c0ec77dfc4552d0d0890c4542c0ce8a13c88d5f
  • Pointer size: 131 Bytes
  • Size of remote file: 118 kB
Audits/audit_della.py ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import yaml
2
+ import torch
3
+ import os
4
+ import sys
5
+ import numpy as np
6
+ import matplotlib.pyplot as plt
7
+ from safetensors import safe_open
8
+ from sklearn.decomposition import PCA
9
+ from sklearn.metrics.pairwise import cosine_similarity
10
+ from tqdm import tqdm
11
+ import argparse
12
+
13
+ # --- CONFIGURATION ---
14
+ PROBE_LAYERS = [
15
+ "model.layers.12.mlp.down_proj.weight", # Mid-model logic
16
+ "lm_head.weight" # Output semantics
17
+ ]
18
+ LOG_FILENAME = "della_scan.log"
19
+ # ---------------------
20
+
21
+ class Logger:
22
+ def __init__(self, filename):
23
+ self.terminal = sys.stdout
24
+ self.log = open(filename, "w", encoding="utf-8")
25
+
26
+ def write(self, message):
27
+ self.terminal.write(message)
28
+ self.log.write(message)
29
+ self.log.flush()
30
+
31
+ def flush(self):
32
+ self.terminal.flush()
33
+ self.log.flush()
34
+
35
+ def close(self):
36
+ self.log.close()
37
+
38
+ def load_yaml_config(config_path):
39
+ print(f"Loading config: {config_path}")
40
+ with open(config_path, 'r', encoding='utf-8') as f:
41
+ config = yaml.safe_load(f)
42
+
43
+ models = []
44
+ base_model = None
45
+
46
+ # Extract base model
47
+ if 'base_model' in config:
48
+ base_model = config['base_model']
49
+
50
+ # Extract models list
51
+ if 'models' in config:
52
+ for m in config['models']:
53
+ models.append(m['model'])
54
+
55
+ return base_model, models
56
+
57
+ def get_model_fingerprint(model_path, probe_layers):
58
+ tensors = []
59
+ if os.path.exists(model_path):
60
+ files = [f for f in os.listdir(model_path) if f.endswith('.safetensors')]
61
+ files.sort()
62
+ found_layers = 0
63
+
64
+ for file in files:
65
+ full_path = os.path.join(model_path, file)
66
+ try:
67
+ with safe_open(full_path, framework="pt", device="cpu") as f:
68
+ keys = f.keys()
69
+ for layer in probe_layers:
70
+ if layer in keys:
71
+ t = f.get_tensor(layer).float().view(-1)
72
+ t = t[::10] # Downsample
73
+ tensors.append(t)
74
+ found_layers += 1
75
+ except Exception as e:
76
+ print(f"Error reading {file}: {e}")
77
+
78
+ if found_layers == 0:
79
+ return None
80
+ else:
81
+ return None
82
+
83
+ if not tensors:
84
+ return None
85
+
86
+ return torch.cat(tensors)
87
+
88
+ def analyze_task_vectors(base_fp, donor_fps):
89
+ # 0. Handle size mismatches (Manifold Alignment)
90
+ base_size = base_fp.numel()
91
+ donor_sizes = [f.numel() for f in donor_fps]
92
+
93
+ min_size = min([base_size] + donor_sizes)
94
+
95
+ if any(s != min_size for s in donor_sizes) or base_size != min_size:
96
+ print(f"\n[!] SIZE MISMATCH DETECTED")
97
+ print(f" Base Size: {base_size}")
98
+ print(f" Min Donor: {min(donor_sizes)}")
99
+ print(f" Action: Truncating all models to {min_size} for audit.")
100
+
101
+ # Align fingerprints
102
+ aligned_base = base_fp[:min_size]
103
+ aligned_donors = [f[:min_size] for f in donor_fps]
104
+
105
+ # 1. Calculate Task Vectors (Delta = Donor - Base)
106
+ task_vectors = []
107
+ for d_fp in aligned_donors:
108
+ task_vectors.append(d_fp - aligned_base)
109
+
110
+ # Stack into matrix [N_donors, N_features]
111
+ data_matrix = torch.stack(task_vectors).numpy()
112
+
113
+ # 2. Norm Analysis (Magnitude of the Delta)
114
+ norms = np.linalg.norm(data_matrix, axis=1)
115
+
116
+ # 3. Cosine Similarity Matrix (Directional Alignment)
117
+ cos_sim = cosine_similarity(data_matrix)
118
+
119
+ # 4. PCA Projection (2D)
120
+ # Center the task vectors
121
+ centered_data = data_matrix - np.mean(data_matrix, axis=0)
122
+
123
+ if len(donor_fps) > 1:
124
+ pca = PCA(n_components=2)
125
+ coords = pca.fit_transform(centered_data)
126
+ var_ratio = pca.explained_variance_ratio_
127
+ else:
128
+ coords = np.zeros((1, 2))
129
+ var_ratio = [1.0, 0.0]
130
+
131
+ return norms, cos_sim, coords, var_ratio, donor_sizes
132
+
133
+ def plot_results(model_ids, norms, cos_sim, coords, var_ratio):
134
+ labels = [str(mid) for mid in model_ids]
135
+
136
+ fig = plt.figure(figsize=(20, 12))
137
+ fig.suptitle(f"DELLA/Task Arithmetic Compatibility Audit ({len(model_ids)} Donors)\nRefer to della_scan.log for ID Key", fontsize=16)
138
+
139
+ # --- Plot 1: Task Vector Manifold (PCA) ---
140
+ ax1 = fig.add_subplot(2, 2, 1)
141
+ ax1.scatter(coords[:, 0], coords[:, 1], c='purple', s=80, alpha=0.6)
142
+
143
+ for i, txt in enumerate(labels):
144
+ ax1.annotate(txt, (coords[i, 0], coords[i, 1]), xytext=(3, 3), textcoords='offset points', fontsize=8, fontweight='bold')
145
+
146
+ ax1.set_title(f"Task Vector Map (PCA of Deltas)\nClusters = Redundant Skills")
147
+ ax1.set_xlabel(f"PC1 ({var_ratio[0]:.1%} variance)")
148
+ ax1.set_ylabel(f"PC2 ({var_ratio[1]:.1%} variance)")
149
+ ax1.grid(True, alpha=0.3)
150
+
151
+ # Plot Origin (Base Model reference relative to centered data)
152
+ center_offset = -np.mean(coords, axis=0)
153
+ ax1.scatter(center_offset[0], center_offset[1], c='red', marker='x', s=100, label='Base Model (Ref)')
154
+ ax1.legend()
155
+
156
+ # --- Plot 2: Cosine Similarity Heatmap ---
157
+ ax2 = fig.add_subplot(2, 2, 2)
158
+ # For Task Vectors, negative similarity is common (conflicting directions)
159
+ im = ax2.imshow(cos_sim, cmap='coolwarm', vmin=-1.0, vmax=1.0)
160
+
161
+ ax2.set_xticks(np.arange(len(labels)))
162
+ ax2.set_yticks(np.arange(len(labels)))
163
+ ax2.set_xticklabels(labels, rotation=90, fontsize=6)
164
+ ax2.set_yticklabels(labels, fontsize=6)
165
+
166
+ ax2.set_title("Task Vector Alignment (Blue=Opposed, Red=Aligned)")
167
+ plt.colorbar(im, ax=ax2)
168
+
169
+ # --- Plot 3: Delta Magnitude (L2 Norm) ---
170
+ ax3 = fig.add_subplot(2, 1, 2)
171
+ bars = ax3.bar(labels, norms, color='orange', alpha=0.6)
172
+ ax3.set_title("Task Vector Magnitude (L2 Norm)\nHigh bars = Drastic deviation from Base Model")
173
+ ax3.set_ylabel("Delta L2 Norm")
174
+ ax3.set_xlabel("Donor ID")
175
+ ax3.grid(axis='y', alpha=0.3)
176
+
177
+ for bar in bars:
178
+ height = bar.get_height()
179
+ ax3.text(bar.get_x() + bar.get_width()/2., height,
180
+ f'{height:.1f}', ha='center', va='bottom', fontsize=6, rotation=90)
181
+
182
+ plt.tight_layout()
183
+ plt.show()
184
+
185
+ def main():
186
+ # Hook stdout to log file
187
+ sys.stdout = Logger(LOG_FILENAME)
188
+
189
+ parser = argparse.ArgumentParser(description="Audit MergeKit models for DELLA/Task Arithmetic compatibility.")
190
+ parser.add_argument("config", help="Path to the mergekit yaml config file")
191
+ args = parser.parse_args()
192
+
193
+ print(f"--- DELLA AUDIT V2 START ---")
194
+ base_model_path, donor_paths = load_yaml_config(args.config)
195
+
196
+ if not base_model_path:
197
+ print("Error: No 'base_model' found in config. DELLA requires a base model.")
198
+ return
199
+
200
+ print(f"Base Model: {base_model_path}")
201
+ print(f"Donors: {len(donor_paths)}")
202
+
203
+ print("\nExtracting BASE MODEL fingerprint...")
204
+ base_fp = get_model_fingerprint(base_model_path, PROBE_LAYERS)
205
+ if base_fp is None:
206
+ print("Failed to load base model. Exiting.")
207
+ return
208
+
209
+ donor_fps = []
210
+ valid_donors = []
211
+ valid_ids = []
212
+
213
+ print("\nExtracting DONOR fingerprints...")
214
+ for i, path in enumerate(tqdm(donor_paths)):
215
+ fp = get_model_fingerprint(path, PROBE_LAYERS)
216
+ if fp is not None:
217
+ donor_fps.append(fp)
218
+ valid_donors.append(path)
219
+ valid_ids.append(i + 1)
220
+ else:
221
+ print(f"Skipping {path} (failed to load)")
222
+
223
+ if len(valid_donors) < 1:
224
+ print("Need at least 1 valid donor.")
225
+ return
226
+
227
+ print("\nComputing Task Vector geometry...")
228
+ norms, cos_sim, coords, var_ratio, sizes = analyze_task_vectors(base_fp, donor_fps)
229
+
230
+ # --- LOGGING THE KEY ---
231
+ print("\n" + "="*80)
232
+ print(f"{'ID':<5} | {'Model Name'}")
233
+ print("-" * 80)
234
+ for i, path in enumerate(valid_donors):
235
+ name = os.path.basename(path).replace("!models--", "")
236
+ print(f"#{valid_ids[i]:<4} | {name}")
237
+ print("="*80 + "\n")
238
+
239
+ # --- MAGNITUDE ANALYSIS ---
240
+ print("--- MAGNITUDE ANALYSIS & DATA POINTS ---")
241
+ print(f"{'ID':<5} | {'Status':<10} | {'Delta Norm':<12} | {'Orig Size':<12} | {'Model Name'}")
242
+ print("-" * 100)
243
+
244
+ mean_norm = np.mean(norms)
245
+ std_norm = np.std(norms)
246
+
247
+ for i, model in enumerate(valid_donors):
248
+ name = os.path.basename(model).replace("!models--", "")
249
+ # Check if norm is significantly higher than average (potential destroyer of weights)
250
+ z_score = (norms[i] - mean_norm) / (std_norm + 1e-8)
251
+ status = "HIGH MAG" if z_score > 1.5 else "OK"
252
+
253
+ print(f"#{valid_ids[i]:<4} | {status:<10} | {norms[i]:<12.4f} | {sizes[i]:<12} | {name}")
254
+
255
+ print("\nLog saved to: " + LOG_FILENAME)
256
+ print("Displaying charts...")
257
+
258
+ # Reset stdout
259
+ sys.stdout.terminal.flush()
260
+
261
+ plot_results(valid_ids, norms, cos_sim, coords, var_ratio)
262
+
263
+ # Close log
264
+ sys.stdout.close()
265
+
266
+ if __name__ == "__main__":
267
+ main()
Audits/audit_karcher.py ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import yaml
2
+ import torch
3
+ import os
4
+ import sys
5
+ import numpy as np
6
+ import matplotlib.pyplot as plt
7
+ from safetensors import safe_open
8
+ from sklearn.decomposition import PCA
9
+ from sklearn.metrics.pairwise import cosine_similarity
10
+ from tqdm import tqdm
11
+ import argparse
12
+
13
+ # --- CONFIGURATION ---
14
+ PROBE_LAYERS = [
15
+ "model.layers.12.mlp.down_proj.weight", # Mid-model logic
16
+ "lm_head.weight" # Output semantics
17
+ ]
18
+ LOG_FILENAME = "karcher_scan.log"
19
+ # ---------------------
20
+
21
+ class Logger:
22
+ def __init__(self, filename):
23
+ self.terminal = sys.stdout
24
+ self.log = open(filename, "w", encoding="utf-8")
25
+
26
+ def write(self, message):
27
+ self.terminal.write(message)
28
+ self.log.write(message)
29
+ self.log.flush()
30
+
31
+ def flush(self):
32
+ self.terminal.flush()
33
+ self.log.flush()
34
+
35
+ def close(self):
36
+ self.log.close()
37
+
38
+ def load_yaml_config(config_path):
39
+ print(f"Loading config: {config_path}")
40
+ with open(config_path, 'r', encoding='utf-8') as f:
41
+ config = yaml.safe_load(f)
42
+
43
+ models = []
44
+ if 'models' in config:
45
+ for m in config['models']:
46
+ models.append(m['model'])
47
+ return models
48
+
49
+ def get_model_fingerprint(model_path, probe_layers):
50
+ tensors = []
51
+ if os.path.exists(model_path):
52
+ files = [f for f in os.listdir(model_path) if f.endswith('.safetensors')]
53
+ files.sort()
54
+ found_layers = 0
55
+
56
+ for file in files:
57
+ full_path = os.path.join(model_path, file)
58
+ try:
59
+ with safe_open(full_path, framework="pt", device="cpu") as f:
60
+ keys = f.keys()
61
+ for layer in probe_layers:
62
+ if layer in keys:
63
+ t = f.get_tensor(layer).float().view(-1)
64
+ t = t[::10] # Downsample
65
+ tensors.append(t)
66
+ found_layers += 1
67
+ except Exception as e:
68
+ print(f"Error reading {file}: {e}")
69
+
70
+ if found_layers == 0:
71
+ return None
72
+ else:
73
+ return None
74
+
75
+ if not tensors:
76
+ return None
77
+
78
+ return torch.cat(tensors)
79
+
80
+ def analyze_compatibility(fingerprints):
81
+ # 0. Handle size mismatches (Manifold Alignment)
82
+ # We keep 'sizes' as the ORIGINAL sizes for logging
83
+ sizes = [f.numel() for f in fingerprints]
84
+ min_size = min(sizes)
85
+ max_size = max(sizes)
86
+
87
+ if min_size != max_size:
88
+ print(f"\n[!] SIZE MISMATCH DETECTED")
89
+ print(f" Smallest Fingerprint: {min_size}")
90
+ print(f" Largest Fingerprint: {max_size}")
91
+ print(f" Action: Truncating all models to {min_size} for alignment.")
92
+
93
+ # Align all fingerprints to the smallest common denominator
94
+ aligned_fingerprints = [f[:min_size] for f in fingerprints]
95
+
96
+ # Stack into matrix [N_models, N_features]
97
+ data_matrix = torch.stack(aligned_fingerprints).numpy()
98
+
99
+ # 1. Norm Analysis (Magnitude)
100
+ norms = np.linalg.norm(data_matrix, axis=1)
101
+
102
+ # 2. Cosine Similarity Matrix
103
+ cos_sim = cosine_similarity(data_matrix)
104
+
105
+ # 3. PCA Projection (2D)
106
+ centered_data = data_matrix - np.mean(data_matrix, axis=0)
107
+ pca = PCA(n_components=2)
108
+ coords = pca.fit_transform(centered_data)
109
+
110
+ return norms, cos_sim, coords, pca.explained_variance_ratio_, sizes
111
+
112
+ def plot_results(model_ids, norms, cos_sim, coords, var_ratio):
113
+ # Use IDs for plotting
114
+ labels = [str(mid) for mid in model_ids]
115
+
116
+ fig = plt.figure(figsize=(20, 12))
117
+ fig.suptitle(f"Karcher Merge Compatibility Audit ({len(model_ids)} Models)\nRefer to karcher_scan.log for ID Key", fontsize=16)
118
+
119
+ # --- Plot 1: PCA Manifold Map ---
120
+ ax1 = fig.add_subplot(2, 2, 1)
121
+ ax1.scatter(coords[:, 0], coords[:, 1], c='blue', s=80, alpha=0.6)
122
+
123
+ # Annotate points with IDs
124
+ for i, txt in enumerate(labels):
125
+ ax1.annotate(txt, (coords[i, 0], coords[i, 1]), xytext=(3, 3), textcoords='offset points', fontsize=8, fontweight='bold')
126
+
127
+ ax1.set_title(f"Manifold Map (PCA)\nOutliers here will break the merge")
128
+ ax1.set_xlabel(f"PC1 ({var_ratio[0]:.1%} variance)")
129
+ ax1.set_ylabel(f"PC2 ({var_ratio[1]:.1%} variance)")
130
+ ax1.grid(True, alpha=0.3)
131
+
132
+ # Draw center
133
+ center = np.mean(coords, axis=0)
134
+ ax1.scatter(center[0], center[1], c='red', marker='x', s=100, label='Center')
135
+
136
+ # --- Plot 2: Cosine Similarity Heatmap ---
137
+ ax2 = fig.add_subplot(2, 2, 2)
138
+ im = ax2.imshow(cos_sim, cmap='viridis', vmin=0.8, vmax=1.0)
139
+
140
+ # Set ticks to IDs
141
+ ax2.set_xticks(np.arange(len(labels)))
142
+ ax2.set_yticks(np.arange(len(labels)))
143
+ ax2.set_xticklabels(labels, rotation=90, fontsize=6)
144
+ ax2.set_yticklabels(labels, fontsize=6)
145
+
146
+ ax2.set_title("Cosine Similarity (Red/Yellow = Compatible)")
147
+ plt.colorbar(im, ax=ax2)
148
+
149
+ # --- Plot 3: Weight Magnitude (Norms) ---
150
+ ax3 = fig.add_subplot(2, 1, 2)
151
+ bars = ax3.bar(labels, norms, color='green', alpha=0.6)
152
+ ax3.set_title("Weight Magnitude (L2 Norm)\nKarcher is sensitive to large differences here")
153
+ ax3.set_ylabel("L2 Norm")
154
+ ax3.set_xlabel("Model ID")
155
+ ax3.grid(axis='y', alpha=0.3)
156
+
157
+ # Add value labels (rotated if many models)
158
+ for bar in bars:
159
+ height = bar.get_height()
160
+ ax3.text(bar.get_x() + bar.get_width()/2., height,
161
+ f'{height:.1f}', ha='center', va='bottom', fontsize=6, rotation=90)
162
+
163
+ plt.tight_layout()
164
+ plt.show()
165
+
166
+ def main():
167
+ # Hook stdout to log file
168
+ sys.stdout = Logger(LOG_FILENAME)
169
+
170
+ parser = argparse.ArgumentParser(description="Audit MergeKit models for Karcher compatibility.")
171
+ parser.add_argument("config", help="Path to the mergekit yaml config file")
172
+ args = parser.parse_args()
173
+
174
+ print(f"--- KARCHER AUDIT V4 START ---")
175
+ model_paths = load_yaml_config(args.config)
176
+ print(f"Found {len(model_paths)} models.")
177
+
178
+ fingerprints = []
179
+ valid_models = []
180
+ valid_ids = []
181
+
182
+ print("Extracting model fingerprints...")
183
+ # We use a manual counter for IDs to keep them sequential based on config order
184
+ for i, path in enumerate(tqdm(model_paths)):
185
+ fp = get_model_fingerprint(path, PROBE_LAYERS)
186
+ if fp is not None:
187
+ fingerprints.append(fp)
188
+ valid_models.append(path)
189
+ valid_ids.append(i + 1) # 1-based indexing
190
+ else:
191
+ print(f"Skipping {path} (failed to load)")
192
+
193
+ if len(valid_models) < 2:
194
+ print("Need at least 2 valid models to compare.")
195
+ return
196
+
197
+ print("Computing manifold geometry...")
198
+ norms, cos_sim, coords, var_ratio, sizes = analyze_compatibility(fingerprints)
199
+
200
+ # --- LOGGING THE KEY ---
201
+ print("\n" + "="*80)
202
+ print(f"{'ID':<5} | {'Model Name'}")
203
+ print("-" * 80)
204
+ for i, path in enumerate(valid_models):
205
+ name = os.path.basename(path).replace("!models--", "")
206
+ print(f"#{valid_ids[i]:<4} | {name}")
207
+ print("="*80 + "\n")
208
+
209
+ # --- OUTLIER ANALYSIS ---
210
+ print("--- OUTLIER ANALYSIS & DATA POINTS ---")
211
+ print(f"{'ID':<5} | {'Status':<10} | {'Dist':<10} | {'Norm':<10} | {'Orig Size':<12} | {'Model Name'}")
212
+ print("-" * 100)
213
+
214
+ centroid = np.mean(coords, axis=0)
215
+ distances = np.linalg.norm(coords - centroid, axis=1)
216
+
217
+ mean_dist = np.mean(distances)
218
+ std_dist = np.std(distances)
219
+ z_scores = (distances - mean_dist) / (std_dist + 1e-8)
220
+
221
+ for i, model in enumerate(valid_models):
222
+ name = os.path.basename(model).replace("!models--", "")
223
+ is_outlier = z_scores[i] > 1.5
224
+ status = "OUTLIER" if is_outlier else "OK"
225
+
226
+ # Log format: ID | Status | Dist | Norm | Size | Name
227
+ print(f"#{valid_ids[i]:<4} | {status:<10} | {distances[i]:<10.4f} | {norms[i]:<10.4f} | {sizes[i]:<12} | {name}")
228
+
229
+ print("\nLog saved to: " + LOG_FILENAME)
230
+ print("Displaying charts...")
231
+
232
+ # Reset stdout so matplotlib doesn't try to write binary image data to our text logger if it crashes
233
+ sys.stdout.terminal.flush()
234
+
235
+ plot_results(valid_ids, norms, cos_sim, coords, var_ratio)
236
+
237
+ # Close log
238
+ sys.stdout.close()
239
+
240
+ if __name__ == "__main__":
241
+ main()
Audits/generalized_task_arithmetic.py ADDED
@@ -0,0 +1,339 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (C) 2025 Arcee AI
2
+ # SPDX-License-Identifier: LGPL-3.0-only
3
+ # della + live audit report by Naphula
4
+
5
+ import logging
6
+ from enum import Enum
7
+ from typing import Any, Dict, List, Optional, Tuple
8
+
9
+ import torch
10
+ from pydantic import BaseModel
11
+ from typing_extensions import Literal, override
12
+
13
+ from mergekit.architecture import WeightInfo
14
+ from mergekit.common import ImmutableMap, ModelReference
15
+ from mergekit.graph import Task
16
+ from mergekit.merge_methods.base import (
17
+ ConfigParameterDef,
18
+ MergeMethod,
19
+ MergeTensorInput,
20
+ )
21
+ from mergekit.sparsify import RescaleNorm, SparsificationMethod, sparsify
22
+
23
+
24
+ class ConsensusMethod(str, Enum):
25
+ count = "count"
26
+ sum = "sum"
27
+
28
+
29
+ class GeneralizedTaskArithmeticMerge(MergeMethod, BaseModel, frozen=True):
30
+ consensus_method: Optional[ConsensusMethod]
31
+ sparsification_method: Optional[SparsificationMethod]
32
+ default_normalize: bool
33
+ default_rescale: bool
34
+ method_name: str
35
+ method_pretty_name: Optional[str]
36
+ method_reference_url: Optional[str]
37
+
38
+ def name(self) -> str:
39
+ return self.method_name
40
+
41
+ @override
42
+ def pretty_name(self) -> Optional[str]:
43
+ return self.method_pretty_name
44
+
45
+ @override
46
+ def reference_url(self) -> Optional[str]:
47
+ return self.method_reference_url
48
+
49
+ def parameters(self) -> List[ConfigParameterDef]:
50
+ return [
51
+ ConfigParameterDef(name="int8_mask", required=False, default_value=False),
52
+ ConfigParameterDef(
53
+ name="normalize", required=False, default_value=self.default_normalize
54
+ ),
55
+ ConfigParameterDef(
56
+ name="rescale", required=False, default_value=self.default_rescale
57
+ ),
58
+ ConfigParameterDef(name="lambda", required=False, default_value=1.0),
59
+ ]
60
+
61
+ def tensor_parameters(self) -> List[ConfigParameterDef]:
62
+ res = [
63
+ ConfigParameterDef(name="weight", required=True),
64
+ ConfigParameterDef(name="density", required=False, default_value=1.0),
65
+ ]
66
+ if self.sparsification_method == SparsificationMethod.magnitude_outliers:
67
+ res.append(
68
+ ConfigParameterDef(
69
+ name="gamma",
70
+ default_value=0.01,
71
+ )
72
+ )
73
+ if self.sparsification_method == SparsificationMethod.della_magprune:
74
+ res.append(
75
+ ConfigParameterDef(
76
+ name="epsilon",
77
+ default_value=0.15,
78
+ )
79
+ )
80
+ return res
81
+
82
+ def make_task(
83
+ self,
84
+ output_weight: WeightInfo,
85
+ tensors: MergeTensorInput,
86
+ base_model: Optional[ModelReference],
87
+ parameters: ImmutableMap[str, Any],
88
+ tensor_parameters: ImmutableMap[ModelReference, ImmutableMap[str, Any]],
89
+ ) -> Task:
90
+ return GTATask(
91
+ method=self,
92
+ tensors=tensors,
93
+ base_model=base_model,
94
+ tensor_parameters=tensor_parameters,
95
+ int8_mask=parameters["int8_mask"],
96
+ normalize=parameters["normalize"],
97
+ lambda_=parameters["lambda"],
98
+ rescale_norm=RescaleNorm.l1 if parameters["rescale"] else None,
99
+ weight_info=output_weight,
100
+ )
101
+
102
+
103
+ class GTATask(Task[torch.Tensor]):
104
+ method: GeneralizedTaskArithmeticMerge
105
+ tensors: MergeTensorInput
106
+ base_model: ModelReference
107
+ weight_info: WeightInfo
108
+ tensor_parameters: ImmutableMap[ModelReference, Any]
109
+ int8_mask: bool
110
+ normalize: bool
111
+ lambda_: float
112
+ rescale_norm: Optional[RescaleNorm]
113
+
114
+ def uses_accelerator(self) -> bool:
115
+ return True
116
+
117
+ def arguments(self) -> Dict[str, Task]:
118
+ return {"tensors": self.tensors}
119
+
120
+ def execute(
121
+ self,
122
+ tensors: Dict[ModelReference, torch.Tensor],
123
+ **_kwargs,
124
+ ) -> torch.Tensor:
125
+ # collect task vectors
126
+ tvs, base = get_task_vectors(
127
+ self.weight_info,
128
+ self.base_model,
129
+ tensors,
130
+ tensor_parameters=self.tensor_parameters.data,
131
+ )
132
+
133
+ # --- LIVE AUDIT CHART ---
134
+ if tvs:
135
+ log_della_audit(
136
+ self.weight_info.name,
137
+ self.base_model,
138
+ tvs,
139
+ self.lambda_,
140
+ self.method.method_pretty_name
141
+ )
142
+ # ------------------------
143
+
144
+ if not tvs:
145
+ return base
146
+
147
+ # sparsify
148
+ if self.method.sparsification_method:
149
+ for tv_info in tvs:
150
+ kwargs = {}
151
+ if "gamma" in tv_info:
152
+ kwargs["gamma"] = tv_info["gamma"]
153
+
154
+ if "epsilon" in tv_info:
155
+ kwargs["epsilon"] = tv_info["epsilon"]
156
+
157
+ tv_info["delta"] = sparsify(
158
+ tv_info["delta"],
159
+ density=tv_info["density"],
160
+ method=self.method.sparsification_method,
161
+ rescale_norm=self.rescale_norm,
162
+ **kwargs,
163
+ )
164
+
165
+ deltas = torch.stack([tv["delta"] for tv in tvs], dim=0)
166
+
167
+ weights = torch.tensor(
168
+ [tv["weight"] for tv in tvs], dtype=deltas.dtype, device=deltas.device
169
+ )
170
+ while len(deltas.shape) > len(weights.shape):
171
+ weights.unsqueeze_(-1)
172
+
173
+ weighted_deltas = deltas * weights
174
+
175
+ # get sign consensus and mix deltas
176
+ if self.method.consensus_method:
177
+ mask_dtype = torch.int8 if self.int8_mask else base.dtype
178
+ mask = get_mask(
179
+ weighted_deltas,
180
+ method=self.method.consensus_method,
181
+ mask_dtype=mask_dtype,
182
+ )
183
+ mixed_delta = (weighted_deltas * mask).sum(dim=0)
184
+ divisor = (weights * mask).sum(dim=0)
185
+ divisor[divisor == 0] = 1
186
+ else:
187
+ mixed_delta = weighted_deltas.sum(dim=0)
188
+ divisor = weights.sum(dim=0)
189
+ divisor[divisor.abs() < 1e-8] = 1
190
+
191
+ if self.normalize:
192
+ mixed_delta /= divisor
193
+
194
+ if self.lambda_ != 1:
195
+ mixed_delta *= self.lambda_
196
+
197
+ return (base + mixed_delta).to(base.dtype)
198
+
199
+ def group_label(self) -> Optional[str]:
200
+ return self.tensors.group_label()
201
+
202
+
203
+ def get_task_vectors(
204
+ weight_info: WeightInfo,
205
+ base_model: ModelReference,
206
+ tensors: ImmutableMap[ModelReference, torch.Tensor],
207
+ tensor_parameters: ImmutableMap[ModelReference, ImmutableMap[str, Any]],
208
+ ) -> Tuple[List[Dict[str, Any]], torch.Tensor]:
209
+ keys = list(tensors.keys())
210
+ base = tensors[base_model]
211
+
212
+ parameter_name = weight_info.name
213
+
214
+ res = []
215
+ for model in keys:
216
+ if model == base_model:
217
+ continue
218
+
219
+ x = tensors[model].to(base.dtype)
220
+ if x.shape != base.shape:
221
+ if weight_info.is_embed:
222
+ x = x[: base.shape[0], : base.shape[1]]
223
+ logging.warning(f"Using submatrix of {model}:{parameter_name}")
224
+ else:
225
+ logging.warning(
226
+ f"skipping {model}:{parameter_name} due to size mismatch"
227
+ )
228
+ continue
229
+
230
+ delta = x - base
231
+ del x
232
+ del tensors[model]
233
+
234
+ d = {}
235
+ d["model"] = model
236
+ d["delta"] = delta
237
+ for p in tensor_parameters[model]:
238
+ d[p] = tensor_parameters[model][p]
239
+ res.append(d)
240
+ return res, base
241
+
242
+
243
+ def get_mask(
244
+ delta: torch.Tensor,
245
+ method: Literal["sum", "count"] = "sum",
246
+ mask_dtype: Optional[torch.dtype] = None,
247
+ ):
248
+ """Returns a mask determining which delta vectors should be merged
249
+ into the final model.
250
+
251
+ For the methodology described in the TIES paper use 'sum'. For a
252
+ simpler naive count of signs, use 'count'."""
253
+ if mask_dtype is None:
254
+ mask_dtype = delta.dtype
255
+
256
+ sign = delta.sign().to(mask_dtype)
257
+
258
+ if method == "sum":
259
+ sign_weight = delta.sum(dim=0)
260
+ majority_sign = (sign_weight >= 0).to(mask_dtype) * 2 - 1
261
+ del sign_weight
262
+ elif method == "count":
263
+ majority_sign = (sign.sum(dim=0) >= 0).to(mask_dtype) * 2 - 1
264
+ else:
265
+ raise RuntimeError(f'Unimplemented mask method "{method}"')
266
+
267
+ return sign == majority_sign
268
+
269
+
270
+ def log_della_audit(
271
+ layer_name: str,
272
+ base_model: ModelReference,
273
+ tvs: List[Dict[str, Any]],
274
+ global_lambda: float,
275
+ method_name: str
276
+ ):
277
+ """Prints and saves a bar chart of DELLA/Task Arithmetic distribution based on actual Delta Norms."""
278
+
279
+ base_name = str(base_model.model.path).split("\\")[-1].split("/")[-1][:50]
280
+
281
+ bar_char = "█"
282
+ lines = [f"\n[{method_name} Audit] Layer: {layer_name} | Lambda={global_lambda:.2f}"]
283
+ lines.append(f" [BASE] {base_name:<50}")
284
+
285
+ # 1. Calculate stats
286
+ stats = []
287
+ total_impact = 0.0
288
+
289
+ for tv in tvs:
290
+ model_name = str(tv['model'].model.path).split("\\")[-1].split("/")[-1][:50]
291
+ weight = tv.get('weight', 0.0)
292
+ density = tv.get('density', 1.0)
293
+ epsilon = tv.get('epsilon', None)
294
+ delta = tv.get('delta', None)
295
+
296
+ norm = 0.0
297
+ if delta is not None:
298
+ # Use float32 for norm calculation to be safe
299
+ norm = torch.norm(delta.float()).item()
300
+
301
+ # Effective contribution magnitude = Weight * Norm
302
+ # This shows how much this model is actually moving the weights
303
+ impact = weight * norm
304
+ total_impact += impact
305
+
306
+ stats.append({
307
+ 'name': model_name,
308
+ 'weight': weight,
309
+ 'density': density,
310
+ 'epsilon': epsilon,
311
+ 'norm': norm,
312
+ 'impact': impact
313
+ })
314
+
315
+ # Sort by name for consistent logs
316
+ stats.sort(key=lambda x: x['name'])
317
+
318
+ # 2. Generate bars
319
+ for s in stats:
320
+ # Calculate percentage relative to the sum of all impacts (Share of Voice)
321
+ pct = (s['impact'] / total_impact * 100) if total_impact > 0 else 0.0
322
+
323
+ # Bar length (max 50 chars for 100%)
324
+ bar_len = int(max(0, min(50, pct / 2)))
325
+ bar = bar_char * bar_len
326
+
327
+ # Format info string
328
+ # W=Weight, D=Density, N=DeltaNorm
329
+ info = f"W:{s['weight']:.2f} D:{s['density']:.2f} N:{s['norm']:.2f}"
330
+ if s['epsilon'] is not None:
331
+ info += f" E:{s['epsilon']:.2f}"
332
+
333
+ lines.append(f" {s['name']:<50}: {bar:<50} {pct:5.1f}% ({info})")
334
+
335
+ log_entry = "\n".join(lines)
336
+ print(log_entry)
337
+
338
+ with open("della_audit.log", "a", encoding="utf-8") as f:
339
+ f.write(log_entry + "\n")
Audits/model_stock.py ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (C) 2025 Arcee AI
2
+ # SPDX-License-Identifier: LGPL-3.0-only
3
+ # model_stock + live audit report by Naphula
4
+
5
+ import logging
6
+ import os
7
+ from typing import Any, Dict, List, Optional
8
+
9
+ import torch
10
+ from typing_extensions import override
11
+
12
+ from mergekit.architecture import WeightInfo
13
+ from mergekit.common import ImmutableMap, ModelReference
14
+ from mergekit.graph import Task
15
+ from mergekit.merge_methods.base import (
16
+ ConfigParameterDef,
17
+ MergeMethod,
18
+ MergeTensorInput,
19
+ )
20
+ from mergekit.merge_methods.rectify_embed import rectify_embed_sizes
21
+
22
+
23
+ class ModelStockMergeTask(Task[torch.Tensor]):
24
+ gather_tensors: MergeTensorInput
25
+ base_model: ModelReference
26
+ weight_info: WeightInfo
27
+ filter_wise: bool = False
28
+
29
+ def uses_accelerator(self) -> bool:
30
+ return True
31
+
32
+ def arguments(self) -> Dict[str, Task]:
33
+ return {"tensors": self.gather_tensors}
34
+
35
+ def execute(self, tensors: Dict[ModelReference, torch.Tensor]) -> torch.Tensor:
36
+ if len(tensors) == 1 and self.base_model in tensors:
37
+ return tensors[self.base_model]
38
+ if len(tensors) < 3:
39
+ if self.weight_info.optional:
40
+ logging.warning(
41
+ f"Optional weight {self.weight_info.name} not present in enough models, discarding"
42
+ )
43
+ return None
44
+
45
+ raise ValueError(
46
+ "ModelStockMerge requires at least 3 models (base plus two+ others)"
47
+ )
48
+
49
+ w_0, ws = self.get_rectified_weights(tensors)
50
+ out_shape = w_0.shape
51
+
52
+ if self.filter_wise:
53
+ if w_0.dim() == 1:
54
+ # bias (or other single-vector) parameters should be treated as row vectors
55
+ w_0 = w_0.unsqueeze(0)
56
+ ws = [w.unsqueeze(0) for w in ws]
57
+ else:
58
+ w_0 = w_0.view(-1)
59
+ ws = [w.view(-1) for w in ws]
60
+
61
+ offsets = [w - w_0 for w in ws]
62
+
63
+ # now there is a question of how to come up with a value for theta.
64
+ # in the two-vector case, we can get an exact angle between the two vectors
65
+ # but the paper doesn't explicitly say what to do in the multi-vector case -
66
+ # they keep using a singular theta value and don't elaborate on how to
67
+ # calculate it. i'm going to assume an average of pairwise angles for now? i guess?
68
+
69
+ cos_thetas = []
70
+ for i, w_0_offset in enumerate(offsets):
71
+ for j in range(i + 1, len(offsets)):
72
+ w_1_offset = offsets[j]
73
+
74
+ norm_product = torch.norm(w_0_offset, dim=-1) * torch.norm(
75
+ w_1_offset, dim=-1
76
+ )
77
+ cos_theta = (
78
+ (w_0_offset * w_1_offset).sum(dim=-1) / norm_product.clamp(min=1e-6)
79
+ ).clamp(-1, 1)
80
+ cos_thetas.append(cos_theta)
81
+
82
+ cos_theta = torch.stack(cos_thetas).mean(dim=0).unsqueeze(-1)
83
+ N = len(ws)
84
+ t = (N * cos_theta) / (1 + (N - 1) * cos_theta)
85
+
86
+ # --- LIVE AUDIT CHART ---
87
+ t_scalar = t.mean().item()
88
+ base_name = str(self.base_model.model.path)
89
+ donor_names = [str(k.model.path) for k in tensors.keys() if k != self.base_model]
90
+ donor_names.sort() # Deterministic order
91
+
92
+ log_model_stock_audit(self.weight_info.name, t_scalar, base_name, donor_names)
93
+ # ------------------------
94
+
95
+ w_avg = sum(ws) / len(ws)
96
+ w_h = t * w_avg + (1 - t) * w_0
97
+
98
+ return w_h.reshape(out_shape)
99
+
100
+ def get_rectified_weights(self, tensors: Dict[ModelReference, torch.Tensor]):
101
+ if self.base_model not in tensors:
102
+ raise ValueError("Base model tensor not found")
103
+
104
+ all_weights = [tensors[self.base_model]] + [
105
+ tensors[k] for k in tensors if k != self.base_model
106
+ ]
107
+ rectify_embed_sizes(self.weight_info, all_weights)
108
+ w_0 = all_weights[0]
109
+ ws = all_weights[1:]
110
+ return w_0, ws
111
+
112
+ def group_label(self) -> Optional[str]:
113
+ return self.gather_tensors.group_label()
114
+
115
+
116
+ class ModelStockMerge(MergeMethod):
117
+ def name(self) -> str:
118
+ return "model_stock"
119
+
120
+ @override
121
+ def pretty_name(self) -> Optional[str]:
122
+ return "Model Stock"
123
+
124
+ @override
125
+ def reference_url(self):
126
+ return "https://arxiv.org/abs/2403.19522"
127
+
128
+ def parameters(self) -> List[ConfigParameterDef]:
129
+ return [
130
+ ConfigParameterDef(name="filter_wise", required=False, default_value=False)
131
+ ]
132
+
133
+ def make_task(
134
+ self,
135
+ *,
136
+ output_weight: WeightInfo,
137
+ tensors: MergeTensorInput,
138
+ base_model: Optional[ModelReference],
139
+ parameters: ImmutableMap[str, Any],
140
+ **_kwargs,
141
+ ) -> Task:
142
+ return ModelStockMergeTask(
143
+ gather_tensors=tensors,
144
+ base_model=base_model,
145
+ weight_info=output_weight,
146
+ filter_wise=parameters["filter_wise"],
147
+ )
148
+
149
+
150
+ def log_model_stock_audit(layer_name: str, t_value: float, base_name: str, donor_names: List[str]):
151
+ """Prints and saves a bar chart of Model Stock interpolation."""
152
+ # t is the weight of the average of donors.
153
+ # (1-t) is the weight of the base.
154
+ # Each donor gets t / len(donors).
155
+
156
+ n_donors = len(donor_names)
157
+ base_weight = 1.0 - t_value
158
+ donor_weight = t_value / n_donors if n_donors > 0 else 0.0
159
+
160
+ bar_char = "█"
161
+ lines = [f"\n[Model Stock Audit] Layer: {layer_name} | t={t_value:.4f}"]
162
+
163
+ # Base
164
+ pct = base_weight * 100
165
+ # Clamp bar length for visualization safety
166
+ bar_len = int(max(0, min(100, pct)) / 2)
167
+ bar = bar_char * bar_len
168
+ clean_base = base_name.split("\\")[-1].split("/")[-1][:60]
169
+ lines.append(f" {clean_base:<60}: {bar:<50} ({pct:6.2f}%)")
170
+
171
+ # Donors
172
+ for name in donor_names:
173
+ pct = donor_weight * 100
174
+ bar_len = int(max(0, min(100, pct)) / 2)
175
+ bar = bar_char * bar_len
176
+ clean_name = name.split("\\")[-1].split("/")[-1][:60]
177
+ lines.append(f" {clean_name:<60}: {bar:<50} ({pct:6.2f}%)")
178
+
179
+ log_entry = "\n".join(lines)
180
+ print(log_entry)
181
+
182
+ with open("model_stock_audit.log", "a", encoding="utf-8") as f:
183
+ f.write(log_entry + "\n")
model_tools.md CHANGED
@@ -17,6 +17,18 @@ Tools to enhance LLM quantizations and merging
17
  # config.py
18
  - Simply replace line 13 | BEFORE `ScalarOrGradient: TypeAlias = Union[float, List[float]]` → AFTER `ScalarOrGradient: TypeAlias = Union[float, List[float], str, bool]` | to allow for custom filepath strings within parameter settings.
19
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  # [metadata_audit.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/metadata_audit.py)
21
  - Checks multiple models within subdirectories for vocab or rope mismatch (useful for large merges). Calibrated for Mistral Nemo 12B by default.
22
 
 
17
  # config.py
18
  - Simply replace line 13 | BEFORE `ScalarOrGradient: TypeAlias = Union[float, List[float]]` → AFTER `ScalarOrGradient: TypeAlias = Union[float, List[float], str, bool]` | to allow for custom filepath strings within parameter settings.
19
 
20
+ # [audit_della.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/audit_della.py)
21
+ - Audit the compatibility of donor models for `Della` merges before merging. See: [example chart Asmodeus](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Asmodeus_Audit.png), [example log Asmodeus](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Asmodeus_Audit.log), [example chart Slimaki](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Slimaki_Audit.png), [example log Slimaki](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Slimaki_Audit.log)
22
+
23
+ # [audit_karcher.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/audit_karcher.py)
24
+ - Audit the compatibility of donor models for `Karcher` merges before merging. See: [example chart Goetia](https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/nSuSM6v_BQBP4tAWK9rGQ.png)
25
+
26
+ # [generalized_task_arithmetic.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/generalized_task_arithmetic.py)
27
+ - Live audit reports of **actual contribution magnitude** on a per-layer basis for `Della` merges. See: [example audit Asmodeus](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Asmodeus_Live_Audit.png), [example audit Slimaki](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Slimaki_Live_Audit.png)
28
+
29
+ # [model_stock.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/model_stock.py)
30
+ - Live audit reports of **actual contribution magnitude** on a per-layer basis for `Model_Stock` merges.
31
+
32
  # [metadata_audit.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/metadata_audit.py)
33
  - Checks multiple models within subdirectories for vocab or rope mismatch (useful for large merges). Calibrated for Mistral Nemo 12B by default.
34