Spaces:
Running
Running
Upload 12 files
Browse files- .gitattributes +3 -0
- Audits/Asmodeus_Audit.log +56 -0
- Audits/Asmodeus_Audit.png +0 -0
- Audits/Asmodeus_Live_Audit.png +3 -0
- Audits/Slimaki_Audit.log +36 -0
- Audits/Slimaki_Audit.png +0 -0
- Audits/Slimaki_Live_Audit.png +3 -0
- Audits/Unreleased_2501_Della_Live_Audit.png +3 -0
- Audits/audit_della.py +267 -0
- Audits/audit_karcher.py +241 -0
- Audits/generalized_task_arithmetic.py +339 -0
- Audits/model_stock.py +183 -0
- model_tools.md +12 -0
.gitattributes
CHANGED
|
@@ -35,3 +35,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
gemma-2-9b-it.imatrix filter=lfs diff=lfs merge=lfs -text
|
| 37 |
imatrix_unsloth.dat filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
gemma-2-9b-it.imatrix filter=lfs diff=lfs merge=lfs -text
|
| 37 |
imatrix_unsloth.dat filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
Audits/Asmodeus_Live_Audit.png filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
Audits/Slimaki_Live_Audit.png filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
Audits/Unreleased_2501_Della_Live_Audit.png filter=lfs diff=lfs merge=lfs -text
|
Audits/Asmodeus_Audit.log
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
--- DELLA AUDIT V2 START ---
|
| 2 |
+
Loading config: config.yaml
|
| 3 |
+
Base Model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
|
| 4 |
+
Donors: 17
|
| 5 |
+
|
| 6 |
+
Extracting BASE MODEL fingerprint...
|
| 7 |
+
|
| 8 |
+
Extracting DONOR fingerprints...
|
| 9 |
+
|
| 10 |
+
Computing Task Vector geometry...
|
| 11 |
+
|
| 12 |
+
================================================================================
|
| 13 |
+
ID | Model Name
|
| 14 |
+
--------------------------------------------------------------------------------
|
| 15 |
+
#1 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
|
| 16 |
+
#2 | TheDrummer--Cydonia-24B-v4.3
|
| 17 |
+
#3 | ReadyArt--4.2.0-Broken-Tutu-24b
|
| 18 |
+
#4 | zerofata--MS3.2-PaintedFantasy-v2-24B
|
| 19 |
+
#5 | TheDrummer--Magidonia-24B-v4.3
|
| 20 |
+
#6 | TheDrummer--Precog-24B-v1
|
| 21 |
+
#7 | zerofata--MS3.2-PaintedFantasy-v3-24B
|
| 22 |
+
#8 | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
|
| 23 |
+
#9 | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
|
| 24 |
+
#10 | trashpanda-org--MS3.2-24B-Mullein-v2
|
| 25 |
+
#11 | LatitudeGames--Hearthfire-24B
|
| 26 |
+
#12 | TheDrummer--Cydonia-24B-v4.2.0
|
| 27 |
+
#13 | TheDrummer--Magidonia-24B-v4.2.0
|
| 28 |
+
#14 | ConicCat--Mistral-Small-3.2-AntiRep-24B
|
| 29 |
+
#15 | Undi95--MistralThinker-v1.1
|
| 30 |
+
#16 | CrucibleLab--M3.2-24B-Loki-V2
|
| 31 |
+
#17 | Darkhn--M3.2-24B-Animus-V7.1
|
| 32 |
+
================================================================================
|
| 33 |
+
|
| 34 |
+
--- MAGNITUDE ANALYSIS & DATA POINTS ---
|
| 35 |
+
ID | Status | Delta Norm | Orig Size | Model Name
|
| 36 |
+
----------------------------------------------------------------------------------------------------
|
| 37 |
+
#1 | OK | 0.0000 | 83886080 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
|
| 38 |
+
#2 | OK | 1.2955 | 83886080 | TheDrummer--Cydonia-24B-v4.3
|
| 39 |
+
#3 | HIGH MAG | 46.6745 | 83886080 | ReadyArt--4.2.0-Broken-Tutu-24b
|
| 40 |
+
#4 | OK | 0.0505 | 83886080 | zerofata--MS3.2-PaintedFantasy-v2-24B
|
| 41 |
+
#5 | OK | 4.5662 | 83886080 | TheDrummer--Magidonia-24B-v4.3
|
| 42 |
+
#6 | OK | 4.0883 | 83886080 | TheDrummer--Precog-24B-v1
|
| 43 |
+
#7 | OK | 4.8187 | 83886080 | zerofata--MS3.2-PaintedFantasy-v3-24B
|
| 44 |
+
#8 | OK | 1.9250 | 83886080 | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
|
| 45 |
+
#9 | HIGH MAG | 47.3140 | 83886080 | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
|
| 46 |
+
#10 | OK | 0.1586 | 83886080 | trashpanda-org--MS3.2-24B-Mullein-v2
|
| 47 |
+
#11 | OK | 1.9367 | 83886080 | LatitudeGames--Hearthfire-24B
|
| 48 |
+
#12 | OK | 1.0936 | 83886080 | TheDrummer--Cydonia-24B-v4.2.0
|
| 49 |
+
#13 | OK | 3.9147 | 83886080 | TheDrummer--Magidonia-24B-v4.2.0
|
| 50 |
+
#14 | OK | 0.0164 | 83886080 | ConicCat--Mistral-Small-3.2-AntiRep-24B
|
| 51 |
+
#15 | OK | 11.4846 | 83886080 | Undi95--MistralThinker-v1.1
|
| 52 |
+
#16 | OK | 3.1101 | 83886080 | CrucibleLab--M3.2-24B-Loki-V2
|
| 53 |
+
#17 | OK | 0.7205 | 83886080 | Darkhn--M3.2-24B-Animus-V7.1
|
| 54 |
+
|
| 55 |
+
Log saved to: della_scan.log
|
| 56 |
+
Displaying charts...
|
Audits/Asmodeus_Audit.png
ADDED
|
Audits/Asmodeus_Live_Audit.png
ADDED
|
Git LFS Details
|
Audits/Slimaki_Audit.log
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
--- DELLA AUDIT V2 START ---
|
| 2 |
+
Loading config: config.yaml
|
| 3 |
+
Base Model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
|
| 4 |
+
Donors: 7
|
| 5 |
+
|
| 6 |
+
Extracting BASE MODEL fingerprint...
|
| 7 |
+
|
| 8 |
+
Extracting DONOR fingerprints...
|
| 9 |
+
|
| 10 |
+
Computing Task Vector geometry...
|
| 11 |
+
|
| 12 |
+
================================================================================
|
| 13 |
+
ID | Model Name
|
| 14 |
+
--------------------------------------------------------------------------------
|
| 15 |
+
#1 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
|
| 16 |
+
#2 | TheDrummer--Cydonia-24B-v4.3
|
| 17 |
+
#3 | ReadyArt--4.2.0-Broken-Tutu-24b
|
| 18 |
+
#4 | zerofata--MS3.2-PaintedFantasy-v2-24B
|
| 19 |
+
#5 | TheDrummer--Magidonia-24B-v4.3
|
| 20 |
+
#6 | TheDrummer--Precog-24B-v1
|
| 21 |
+
#7 | zerofata--MS3.2-PaintedFantasy-v3-24B
|
| 22 |
+
================================================================================
|
| 23 |
+
|
| 24 |
+
--- MAGNITUDE ANALYSIS & DATA POINTS ---
|
| 25 |
+
ID | Status | Delta Norm | Orig Size | Model Name
|
| 26 |
+
----------------------------------------------------------------------------------------------------
|
| 27 |
+
#1 | OK | 0.0000 | 83886080 | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
|
| 28 |
+
#2 | OK | 1.2955 | 83886080 | TheDrummer--Cydonia-24B-v4.3
|
| 29 |
+
#3 | HIGH MAG | 46.6745 | 83886080 | ReadyArt--4.2.0-Broken-Tutu-24b
|
| 30 |
+
#4 | OK | 0.0505 | 83886080 | zerofata--MS3.2-PaintedFantasy-v2-24B
|
| 31 |
+
#5 | OK | 4.5662 | 83886080 | TheDrummer--Magidonia-24B-v4.3
|
| 32 |
+
#6 | OK | 4.0883 | 83886080 | TheDrummer--Precog-24B-v1
|
| 33 |
+
#7 | OK | 4.8187 | 83886080 | zerofata--MS3.2-PaintedFantasy-v3-24B
|
| 34 |
+
|
| 35 |
+
Log saved to: della_scan.log
|
| 36 |
+
Displaying charts...
|
Audits/Slimaki_Audit.png
ADDED
|
Audits/Slimaki_Live_Audit.png
ADDED
|
Git LFS Details
|
Audits/Unreleased_2501_Della_Live_Audit.png
ADDED
|
Git LFS Details
|
Audits/audit_della.py
ADDED
|
@@ -0,0 +1,267 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import yaml
|
| 2 |
+
import torch
|
| 3 |
+
import os
|
| 4 |
+
import sys
|
| 5 |
+
import numpy as np
|
| 6 |
+
import matplotlib.pyplot as plt
|
| 7 |
+
from safetensors import safe_open
|
| 8 |
+
from sklearn.decomposition import PCA
|
| 9 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
| 10 |
+
from tqdm import tqdm
|
| 11 |
+
import argparse
|
| 12 |
+
|
| 13 |
+
# --- CONFIGURATION ---
|
| 14 |
+
PROBE_LAYERS = [
|
| 15 |
+
"model.layers.12.mlp.down_proj.weight", # Mid-model logic
|
| 16 |
+
"lm_head.weight" # Output semantics
|
| 17 |
+
]
|
| 18 |
+
LOG_FILENAME = "della_scan.log"
|
| 19 |
+
# ---------------------
|
| 20 |
+
|
| 21 |
+
class Logger:
|
| 22 |
+
def __init__(self, filename):
|
| 23 |
+
self.terminal = sys.stdout
|
| 24 |
+
self.log = open(filename, "w", encoding="utf-8")
|
| 25 |
+
|
| 26 |
+
def write(self, message):
|
| 27 |
+
self.terminal.write(message)
|
| 28 |
+
self.log.write(message)
|
| 29 |
+
self.log.flush()
|
| 30 |
+
|
| 31 |
+
def flush(self):
|
| 32 |
+
self.terminal.flush()
|
| 33 |
+
self.log.flush()
|
| 34 |
+
|
| 35 |
+
def close(self):
|
| 36 |
+
self.log.close()
|
| 37 |
+
|
| 38 |
+
def load_yaml_config(config_path):
|
| 39 |
+
print(f"Loading config: {config_path}")
|
| 40 |
+
with open(config_path, 'r', encoding='utf-8') as f:
|
| 41 |
+
config = yaml.safe_load(f)
|
| 42 |
+
|
| 43 |
+
models = []
|
| 44 |
+
base_model = None
|
| 45 |
+
|
| 46 |
+
# Extract base model
|
| 47 |
+
if 'base_model' in config:
|
| 48 |
+
base_model = config['base_model']
|
| 49 |
+
|
| 50 |
+
# Extract models list
|
| 51 |
+
if 'models' in config:
|
| 52 |
+
for m in config['models']:
|
| 53 |
+
models.append(m['model'])
|
| 54 |
+
|
| 55 |
+
return base_model, models
|
| 56 |
+
|
| 57 |
+
def get_model_fingerprint(model_path, probe_layers):
|
| 58 |
+
tensors = []
|
| 59 |
+
if os.path.exists(model_path):
|
| 60 |
+
files = [f for f in os.listdir(model_path) if f.endswith('.safetensors')]
|
| 61 |
+
files.sort()
|
| 62 |
+
found_layers = 0
|
| 63 |
+
|
| 64 |
+
for file in files:
|
| 65 |
+
full_path = os.path.join(model_path, file)
|
| 66 |
+
try:
|
| 67 |
+
with safe_open(full_path, framework="pt", device="cpu") as f:
|
| 68 |
+
keys = f.keys()
|
| 69 |
+
for layer in probe_layers:
|
| 70 |
+
if layer in keys:
|
| 71 |
+
t = f.get_tensor(layer).float().view(-1)
|
| 72 |
+
t = t[::10] # Downsample
|
| 73 |
+
tensors.append(t)
|
| 74 |
+
found_layers += 1
|
| 75 |
+
except Exception as e:
|
| 76 |
+
print(f"Error reading {file}: {e}")
|
| 77 |
+
|
| 78 |
+
if found_layers == 0:
|
| 79 |
+
return None
|
| 80 |
+
else:
|
| 81 |
+
return None
|
| 82 |
+
|
| 83 |
+
if not tensors:
|
| 84 |
+
return None
|
| 85 |
+
|
| 86 |
+
return torch.cat(tensors)
|
| 87 |
+
|
| 88 |
+
def analyze_task_vectors(base_fp, donor_fps):
|
| 89 |
+
# 0. Handle size mismatches (Manifold Alignment)
|
| 90 |
+
base_size = base_fp.numel()
|
| 91 |
+
donor_sizes = [f.numel() for f in donor_fps]
|
| 92 |
+
|
| 93 |
+
min_size = min([base_size] + donor_sizes)
|
| 94 |
+
|
| 95 |
+
if any(s != min_size for s in donor_sizes) or base_size != min_size:
|
| 96 |
+
print(f"\n[!] SIZE MISMATCH DETECTED")
|
| 97 |
+
print(f" Base Size: {base_size}")
|
| 98 |
+
print(f" Min Donor: {min(donor_sizes)}")
|
| 99 |
+
print(f" Action: Truncating all models to {min_size} for audit.")
|
| 100 |
+
|
| 101 |
+
# Align fingerprints
|
| 102 |
+
aligned_base = base_fp[:min_size]
|
| 103 |
+
aligned_donors = [f[:min_size] for f in donor_fps]
|
| 104 |
+
|
| 105 |
+
# 1. Calculate Task Vectors (Delta = Donor - Base)
|
| 106 |
+
task_vectors = []
|
| 107 |
+
for d_fp in aligned_donors:
|
| 108 |
+
task_vectors.append(d_fp - aligned_base)
|
| 109 |
+
|
| 110 |
+
# Stack into matrix [N_donors, N_features]
|
| 111 |
+
data_matrix = torch.stack(task_vectors).numpy()
|
| 112 |
+
|
| 113 |
+
# 2. Norm Analysis (Magnitude of the Delta)
|
| 114 |
+
norms = np.linalg.norm(data_matrix, axis=1)
|
| 115 |
+
|
| 116 |
+
# 3. Cosine Similarity Matrix (Directional Alignment)
|
| 117 |
+
cos_sim = cosine_similarity(data_matrix)
|
| 118 |
+
|
| 119 |
+
# 4. PCA Projection (2D)
|
| 120 |
+
# Center the task vectors
|
| 121 |
+
centered_data = data_matrix - np.mean(data_matrix, axis=0)
|
| 122 |
+
|
| 123 |
+
if len(donor_fps) > 1:
|
| 124 |
+
pca = PCA(n_components=2)
|
| 125 |
+
coords = pca.fit_transform(centered_data)
|
| 126 |
+
var_ratio = pca.explained_variance_ratio_
|
| 127 |
+
else:
|
| 128 |
+
coords = np.zeros((1, 2))
|
| 129 |
+
var_ratio = [1.0, 0.0]
|
| 130 |
+
|
| 131 |
+
return norms, cos_sim, coords, var_ratio, donor_sizes
|
| 132 |
+
|
| 133 |
+
def plot_results(model_ids, norms, cos_sim, coords, var_ratio):
|
| 134 |
+
labels = [str(mid) for mid in model_ids]
|
| 135 |
+
|
| 136 |
+
fig = plt.figure(figsize=(20, 12))
|
| 137 |
+
fig.suptitle(f"DELLA/Task Arithmetic Compatibility Audit ({len(model_ids)} Donors)\nRefer to della_scan.log for ID Key", fontsize=16)
|
| 138 |
+
|
| 139 |
+
# --- Plot 1: Task Vector Manifold (PCA) ---
|
| 140 |
+
ax1 = fig.add_subplot(2, 2, 1)
|
| 141 |
+
ax1.scatter(coords[:, 0], coords[:, 1], c='purple', s=80, alpha=0.6)
|
| 142 |
+
|
| 143 |
+
for i, txt in enumerate(labels):
|
| 144 |
+
ax1.annotate(txt, (coords[i, 0], coords[i, 1]), xytext=(3, 3), textcoords='offset points', fontsize=8, fontweight='bold')
|
| 145 |
+
|
| 146 |
+
ax1.set_title(f"Task Vector Map (PCA of Deltas)\nClusters = Redundant Skills")
|
| 147 |
+
ax1.set_xlabel(f"PC1 ({var_ratio[0]:.1%} variance)")
|
| 148 |
+
ax1.set_ylabel(f"PC2 ({var_ratio[1]:.1%} variance)")
|
| 149 |
+
ax1.grid(True, alpha=0.3)
|
| 150 |
+
|
| 151 |
+
# Plot Origin (Base Model reference relative to centered data)
|
| 152 |
+
center_offset = -np.mean(coords, axis=0)
|
| 153 |
+
ax1.scatter(center_offset[0], center_offset[1], c='red', marker='x', s=100, label='Base Model (Ref)')
|
| 154 |
+
ax1.legend()
|
| 155 |
+
|
| 156 |
+
# --- Plot 2: Cosine Similarity Heatmap ---
|
| 157 |
+
ax2 = fig.add_subplot(2, 2, 2)
|
| 158 |
+
# For Task Vectors, negative similarity is common (conflicting directions)
|
| 159 |
+
im = ax2.imshow(cos_sim, cmap='coolwarm', vmin=-1.0, vmax=1.0)
|
| 160 |
+
|
| 161 |
+
ax2.set_xticks(np.arange(len(labels)))
|
| 162 |
+
ax2.set_yticks(np.arange(len(labels)))
|
| 163 |
+
ax2.set_xticklabels(labels, rotation=90, fontsize=6)
|
| 164 |
+
ax2.set_yticklabels(labels, fontsize=6)
|
| 165 |
+
|
| 166 |
+
ax2.set_title("Task Vector Alignment (Blue=Opposed, Red=Aligned)")
|
| 167 |
+
plt.colorbar(im, ax=ax2)
|
| 168 |
+
|
| 169 |
+
# --- Plot 3: Delta Magnitude (L2 Norm) ---
|
| 170 |
+
ax3 = fig.add_subplot(2, 1, 2)
|
| 171 |
+
bars = ax3.bar(labels, norms, color='orange', alpha=0.6)
|
| 172 |
+
ax3.set_title("Task Vector Magnitude (L2 Norm)\nHigh bars = Drastic deviation from Base Model")
|
| 173 |
+
ax3.set_ylabel("Delta L2 Norm")
|
| 174 |
+
ax3.set_xlabel("Donor ID")
|
| 175 |
+
ax3.grid(axis='y', alpha=0.3)
|
| 176 |
+
|
| 177 |
+
for bar in bars:
|
| 178 |
+
height = bar.get_height()
|
| 179 |
+
ax3.text(bar.get_x() + bar.get_width()/2., height,
|
| 180 |
+
f'{height:.1f}', ha='center', va='bottom', fontsize=6, rotation=90)
|
| 181 |
+
|
| 182 |
+
plt.tight_layout()
|
| 183 |
+
plt.show()
|
| 184 |
+
|
| 185 |
+
def main():
|
| 186 |
+
# Hook stdout to log file
|
| 187 |
+
sys.stdout = Logger(LOG_FILENAME)
|
| 188 |
+
|
| 189 |
+
parser = argparse.ArgumentParser(description="Audit MergeKit models for DELLA/Task Arithmetic compatibility.")
|
| 190 |
+
parser.add_argument("config", help="Path to the mergekit yaml config file")
|
| 191 |
+
args = parser.parse_args()
|
| 192 |
+
|
| 193 |
+
print(f"--- DELLA AUDIT V2 START ---")
|
| 194 |
+
base_model_path, donor_paths = load_yaml_config(args.config)
|
| 195 |
+
|
| 196 |
+
if not base_model_path:
|
| 197 |
+
print("Error: No 'base_model' found in config. DELLA requires a base model.")
|
| 198 |
+
return
|
| 199 |
+
|
| 200 |
+
print(f"Base Model: {base_model_path}")
|
| 201 |
+
print(f"Donors: {len(donor_paths)}")
|
| 202 |
+
|
| 203 |
+
print("\nExtracting BASE MODEL fingerprint...")
|
| 204 |
+
base_fp = get_model_fingerprint(base_model_path, PROBE_LAYERS)
|
| 205 |
+
if base_fp is None:
|
| 206 |
+
print("Failed to load base model. Exiting.")
|
| 207 |
+
return
|
| 208 |
+
|
| 209 |
+
donor_fps = []
|
| 210 |
+
valid_donors = []
|
| 211 |
+
valid_ids = []
|
| 212 |
+
|
| 213 |
+
print("\nExtracting DONOR fingerprints...")
|
| 214 |
+
for i, path in enumerate(tqdm(donor_paths)):
|
| 215 |
+
fp = get_model_fingerprint(path, PROBE_LAYERS)
|
| 216 |
+
if fp is not None:
|
| 217 |
+
donor_fps.append(fp)
|
| 218 |
+
valid_donors.append(path)
|
| 219 |
+
valid_ids.append(i + 1)
|
| 220 |
+
else:
|
| 221 |
+
print(f"Skipping {path} (failed to load)")
|
| 222 |
+
|
| 223 |
+
if len(valid_donors) < 1:
|
| 224 |
+
print("Need at least 1 valid donor.")
|
| 225 |
+
return
|
| 226 |
+
|
| 227 |
+
print("\nComputing Task Vector geometry...")
|
| 228 |
+
norms, cos_sim, coords, var_ratio, sizes = analyze_task_vectors(base_fp, donor_fps)
|
| 229 |
+
|
| 230 |
+
# --- LOGGING THE KEY ---
|
| 231 |
+
print("\n" + "="*80)
|
| 232 |
+
print(f"{'ID':<5} | {'Model Name'}")
|
| 233 |
+
print("-" * 80)
|
| 234 |
+
for i, path in enumerate(valid_donors):
|
| 235 |
+
name = os.path.basename(path).replace("!models--", "")
|
| 236 |
+
print(f"#{valid_ids[i]:<4} | {name}")
|
| 237 |
+
print("="*80 + "\n")
|
| 238 |
+
|
| 239 |
+
# --- MAGNITUDE ANALYSIS ---
|
| 240 |
+
print("--- MAGNITUDE ANALYSIS & DATA POINTS ---")
|
| 241 |
+
print(f"{'ID':<5} | {'Status':<10} | {'Delta Norm':<12} | {'Orig Size':<12} | {'Model Name'}")
|
| 242 |
+
print("-" * 100)
|
| 243 |
+
|
| 244 |
+
mean_norm = np.mean(norms)
|
| 245 |
+
std_norm = np.std(norms)
|
| 246 |
+
|
| 247 |
+
for i, model in enumerate(valid_donors):
|
| 248 |
+
name = os.path.basename(model).replace("!models--", "")
|
| 249 |
+
# Check if norm is significantly higher than average (potential destroyer of weights)
|
| 250 |
+
z_score = (norms[i] - mean_norm) / (std_norm + 1e-8)
|
| 251 |
+
status = "HIGH MAG" if z_score > 1.5 else "OK"
|
| 252 |
+
|
| 253 |
+
print(f"#{valid_ids[i]:<4} | {status:<10} | {norms[i]:<12.4f} | {sizes[i]:<12} | {name}")
|
| 254 |
+
|
| 255 |
+
print("\nLog saved to: " + LOG_FILENAME)
|
| 256 |
+
print("Displaying charts...")
|
| 257 |
+
|
| 258 |
+
# Reset stdout
|
| 259 |
+
sys.stdout.terminal.flush()
|
| 260 |
+
|
| 261 |
+
plot_results(valid_ids, norms, cos_sim, coords, var_ratio)
|
| 262 |
+
|
| 263 |
+
# Close log
|
| 264 |
+
sys.stdout.close()
|
| 265 |
+
|
| 266 |
+
if __name__ == "__main__":
|
| 267 |
+
main()
|
Audits/audit_karcher.py
ADDED
|
@@ -0,0 +1,241 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import yaml
|
| 2 |
+
import torch
|
| 3 |
+
import os
|
| 4 |
+
import sys
|
| 5 |
+
import numpy as np
|
| 6 |
+
import matplotlib.pyplot as plt
|
| 7 |
+
from safetensors import safe_open
|
| 8 |
+
from sklearn.decomposition import PCA
|
| 9 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
| 10 |
+
from tqdm import tqdm
|
| 11 |
+
import argparse
|
| 12 |
+
|
| 13 |
+
# --- CONFIGURATION ---
|
| 14 |
+
PROBE_LAYERS = [
|
| 15 |
+
"model.layers.12.mlp.down_proj.weight", # Mid-model logic
|
| 16 |
+
"lm_head.weight" # Output semantics
|
| 17 |
+
]
|
| 18 |
+
LOG_FILENAME = "karcher_scan.log"
|
| 19 |
+
# ---------------------
|
| 20 |
+
|
| 21 |
+
class Logger:
|
| 22 |
+
def __init__(self, filename):
|
| 23 |
+
self.terminal = sys.stdout
|
| 24 |
+
self.log = open(filename, "w", encoding="utf-8")
|
| 25 |
+
|
| 26 |
+
def write(self, message):
|
| 27 |
+
self.terminal.write(message)
|
| 28 |
+
self.log.write(message)
|
| 29 |
+
self.log.flush()
|
| 30 |
+
|
| 31 |
+
def flush(self):
|
| 32 |
+
self.terminal.flush()
|
| 33 |
+
self.log.flush()
|
| 34 |
+
|
| 35 |
+
def close(self):
|
| 36 |
+
self.log.close()
|
| 37 |
+
|
| 38 |
+
def load_yaml_config(config_path):
|
| 39 |
+
print(f"Loading config: {config_path}")
|
| 40 |
+
with open(config_path, 'r', encoding='utf-8') as f:
|
| 41 |
+
config = yaml.safe_load(f)
|
| 42 |
+
|
| 43 |
+
models = []
|
| 44 |
+
if 'models' in config:
|
| 45 |
+
for m in config['models']:
|
| 46 |
+
models.append(m['model'])
|
| 47 |
+
return models
|
| 48 |
+
|
| 49 |
+
def get_model_fingerprint(model_path, probe_layers):
|
| 50 |
+
tensors = []
|
| 51 |
+
if os.path.exists(model_path):
|
| 52 |
+
files = [f for f in os.listdir(model_path) if f.endswith('.safetensors')]
|
| 53 |
+
files.sort()
|
| 54 |
+
found_layers = 0
|
| 55 |
+
|
| 56 |
+
for file in files:
|
| 57 |
+
full_path = os.path.join(model_path, file)
|
| 58 |
+
try:
|
| 59 |
+
with safe_open(full_path, framework="pt", device="cpu") as f:
|
| 60 |
+
keys = f.keys()
|
| 61 |
+
for layer in probe_layers:
|
| 62 |
+
if layer in keys:
|
| 63 |
+
t = f.get_tensor(layer).float().view(-1)
|
| 64 |
+
t = t[::10] # Downsample
|
| 65 |
+
tensors.append(t)
|
| 66 |
+
found_layers += 1
|
| 67 |
+
except Exception as e:
|
| 68 |
+
print(f"Error reading {file}: {e}")
|
| 69 |
+
|
| 70 |
+
if found_layers == 0:
|
| 71 |
+
return None
|
| 72 |
+
else:
|
| 73 |
+
return None
|
| 74 |
+
|
| 75 |
+
if not tensors:
|
| 76 |
+
return None
|
| 77 |
+
|
| 78 |
+
return torch.cat(tensors)
|
| 79 |
+
|
| 80 |
+
def analyze_compatibility(fingerprints):
|
| 81 |
+
# 0. Handle size mismatches (Manifold Alignment)
|
| 82 |
+
# We keep 'sizes' as the ORIGINAL sizes for logging
|
| 83 |
+
sizes = [f.numel() for f in fingerprints]
|
| 84 |
+
min_size = min(sizes)
|
| 85 |
+
max_size = max(sizes)
|
| 86 |
+
|
| 87 |
+
if min_size != max_size:
|
| 88 |
+
print(f"\n[!] SIZE MISMATCH DETECTED")
|
| 89 |
+
print(f" Smallest Fingerprint: {min_size}")
|
| 90 |
+
print(f" Largest Fingerprint: {max_size}")
|
| 91 |
+
print(f" Action: Truncating all models to {min_size} for alignment.")
|
| 92 |
+
|
| 93 |
+
# Align all fingerprints to the smallest common denominator
|
| 94 |
+
aligned_fingerprints = [f[:min_size] for f in fingerprints]
|
| 95 |
+
|
| 96 |
+
# Stack into matrix [N_models, N_features]
|
| 97 |
+
data_matrix = torch.stack(aligned_fingerprints).numpy()
|
| 98 |
+
|
| 99 |
+
# 1. Norm Analysis (Magnitude)
|
| 100 |
+
norms = np.linalg.norm(data_matrix, axis=1)
|
| 101 |
+
|
| 102 |
+
# 2. Cosine Similarity Matrix
|
| 103 |
+
cos_sim = cosine_similarity(data_matrix)
|
| 104 |
+
|
| 105 |
+
# 3. PCA Projection (2D)
|
| 106 |
+
centered_data = data_matrix - np.mean(data_matrix, axis=0)
|
| 107 |
+
pca = PCA(n_components=2)
|
| 108 |
+
coords = pca.fit_transform(centered_data)
|
| 109 |
+
|
| 110 |
+
return norms, cos_sim, coords, pca.explained_variance_ratio_, sizes
|
| 111 |
+
|
| 112 |
+
def plot_results(model_ids, norms, cos_sim, coords, var_ratio):
|
| 113 |
+
# Use IDs for plotting
|
| 114 |
+
labels = [str(mid) for mid in model_ids]
|
| 115 |
+
|
| 116 |
+
fig = plt.figure(figsize=(20, 12))
|
| 117 |
+
fig.suptitle(f"Karcher Merge Compatibility Audit ({len(model_ids)} Models)\nRefer to karcher_scan.log for ID Key", fontsize=16)
|
| 118 |
+
|
| 119 |
+
# --- Plot 1: PCA Manifold Map ---
|
| 120 |
+
ax1 = fig.add_subplot(2, 2, 1)
|
| 121 |
+
ax1.scatter(coords[:, 0], coords[:, 1], c='blue', s=80, alpha=0.6)
|
| 122 |
+
|
| 123 |
+
# Annotate points with IDs
|
| 124 |
+
for i, txt in enumerate(labels):
|
| 125 |
+
ax1.annotate(txt, (coords[i, 0], coords[i, 1]), xytext=(3, 3), textcoords='offset points', fontsize=8, fontweight='bold')
|
| 126 |
+
|
| 127 |
+
ax1.set_title(f"Manifold Map (PCA)\nOutliers here will break the merge")
|
| 128 |
+
ax1.set_xlabel(f"PC1 ({var_ratio[0]:.1%} variance)")
|
| 129 |
+
ax1.set_ylabel(f"PC2 ({var_ratio[1]:.1%} variance)")
|
| 130 |
+
ax1.grid(True, alpha=0.3)
|
| 131 |
+
|
| 132 |
+
# Draw center
|
| 133 |
+
center = np.mean(coords, axis=0)
|
| 134 |
+
ax1.scatter(center[0], center[1], c='red', marker='x', s=100, label='Center')
|
| 135 |
+
|
| 136 |
+
# --- Plot 2: Cosine Similarity Heatmap ---
|
| 137 |
+
ax2 = fig.add_subplot(2, 2, 2)
|
| 138 |
+
im = ax2.imshow(cos_sim, cmap='viridis', vmin=0.8, vmax=1.0)
|
| 139 |
+
|
| 140 |
+
# Set ticks to IDs
|
| 141 |
+
ax2.set_xticks(np.arange(len(labels)))
|
| 142 |
+
ax2.set_yticks(np.arange(len(labels)))
|
| 143 |
+
ax2.set_xticklabels(labels, rotation=90, fontsize=6)
|
| 144 |
+
ax2.set_yticklabels(labels, fontsize=6)
|
| 145 |
+
|
| 146 |
+
ax2.set_title("Cosine Similarity (Red/Yellow = Compatible)")
|
| 147 |
+
plt.colorbar(im, ax=ax2)
|
| 148 |
+
|
| 149 |
+
# --- Plot 3: Weight Magnitude (Norms) ---
|
| 150 |
+
ax3 = fig.add_subplot(2, 1, 2)
|
| 151 |
+
bars = ax3.bar(labels, norms, color='green', alpha=0.6)
|
| 152 |
+
ax3.set_title("Weight Magnitude (L2 Norm)\nKarcher is sensitive to large differences here")
|
| 153 |
+
ax3.set_ylabel("L2 Norm")
|
| 154 |
+
ax3.set_xlabel("Model ID")
|
| 155 |
+
ax3.grid(axis='y', alpha=0.3)
|
| 156 |
+
|
| 157 |
+
# Add value labels (rotated if many models)
|
| 158 |
+
for bar in bars:
|
| 159 |
+
height = bar.get_height()
|
| 160 |
+
ax3.text(bar.get_x() + bar.get_width()/2., height,
|
| 161 |
+
f'{height:.1f}', ha='center', va='bottom', fontsize=6, rotation=90)
|
| 162 |
+
|
| 163 |
+
plt.tight_layout()
|
| 164 |
+
plt.show()
|
| 165 |
+
|
| 166 |
+
def main():
|
| 167 |
+
# Hook stdout to log file
|
| 168 |
+
sys.stdout = Logger(LOG_FILENAME)
|
| 169 |
+
|
| 170 |
+
parser = argparse.ArgumentParser(description="Audit MergeKit models for Karcher compatibility.")
|
| 171 |
+
parser.add_argument("config", help="Path to the mergekit yaml config file")
|
| 172 |
+
args = parser.parse_args()
|
| 173 |
+
|
| 174 |
+
print(f"--- KARCHER AUDIT V4 START ---")
|
| 175 |
+
model_paths = load_yaml_config(args.config)
|
| 176 |
+
print(f"Found {len(model_paths)} models.")
|
| 177 |
+
|
| 178 |
+
fingerprints = []
|
| 179 |
+
valid_models = []
|
| 180 |
+
valid_ids = []
|
| 181 |
+
|
| 182 |
+
print("Extracting model fingerprints...")
|
| 183 |
+
# We use a manual counter for IDs to keep them sequential based on config order
|
| 184 |
+
for i, path in enumerate(tqdm(model_paths)):
|
| 185 |
+
fp = get_model_fingerprint(path, PROBE_LAYERS)
|
| 186 |
+
if fp is not None:
|
| 187 |
+
fingerprints.append(fp)
|
| 188 |
+
valid_models.append(path)
|
| 189 |
+
valid_ids.append(i + 1) # 1-based indexing
|
| 190 |
+
else:
|
| 191 |
+
print(f"Skipping {path} (failed to load)")
|
| 192 |
+
|
| 193 |
+
if len(valid_models) < 2:
|
| 194 |
+
print("Need at least 2 valid models to compare.")
|
| 195 |
+
return
|
| 196 |
+
|
| 197 |
+
print("Computing manifold geometry...")
|
| 198 |
+
norms, cos_sim, coords, var_ratio, sizes = analyze_compatibility(fingerprints)
|
| 199 |
+
|
| 200 |
+
# --- LOGGING THE KEY ---
|
| 201 |
+
print("\n" + "="*80)
|
| 202 |
+
print(f"{'ID':<5} | {'Model Name'}")
|
| 203 |
+
print("-" * 80)
|
| 204 |
+
for i, path in enumerate(valid_models):
|
| 205 |
+
name = os.path.basename(path).replace("!models--", "")
|
| 206 |
+
print(f"#{valid_ids[i]:<4} | {name}")
|
| 207 |
+
print("="*80 + "\n")
|
| 208 |
+
|
| 209 |
+
# --- OUTLIER ANALYSIS ---
|
| 210 |
+
print("--- OUTLIER ANALYSIS & DATA POINTS ---")
|
| 211 |
+
print(f"{'ID':<5} | {'Status':<10} | {'Dist':<10} | {'Norm':<10} | {'Orig Size':<12} | {'Model Name'}")
|
| 212 |
+
print("-" * 100)
|
| 213 |
+
|
| 214 |
+
centroid = np.mean(coords, axis=0)
|
| 215 |
+
distances = np.linalg.norm(coords - centroid, axis=1)
|
| 216 |
+
|
| 217 |
+
mean_dist = np.mean(distances)
|
| 218 |
+
std_dist = np.std(distances)
|
| 219 |
+
z_scores = (distances - mean_dist) / (std_dist + 1e-8)
|
| 220 |
+
|
| 221 |
+
for i, model in enumerate(valid_models):
|
| 222 |
+
name = os.path.basename(model).replace("!models--", "")
|
| 223 |
+
is_outlier = z_scores[i] > 1.5
|
| 224 |
+
status = "OUTLIER" if is_outlier else "OK"
|
| 225 |
+
|
| 226 |
+
# Log format: ID | Status | Dist | Norm | Size | Name
|
| 227 |
+
print(f"#{valid_ids[i]:<4} | {status:<10} | {distances[i]:<10.4f} | {norms[i]:<10.4f} | {sizes[i]:<12} | {name}")
|
| 228 |
+
|
| 229 |
+
print("\nLog saved to: " + LOG_FILENAME)
|
| 230 |
+
print("Displaying charts...")
|
| 231 |
+
|
| 232 |
+
# Reset stdout so matplotlib doesn't try to write binary image data to our text logger if it crashes
|
| 233 |
+
sys.stdout.terminal.flush()
|
| 234 |
+
|
| 235 |
+
plot_results(valid_ids, norms, cos_sim, coords, var_ratio)
|
| 236 |
+
|
| 237 |
+
# Close log
|
| 238 |
+
sys.stdout.close()
|
| 239 |
+
|
| 240 |
+
if __name__ == "__main__":
|
| 241 |
+
main()
|
Audits/generalized_task_arithmetic.py
ADDED
|
@@ -0,0 +1,339 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (C) 2025 Arcee AI
|
| 2 |
+
# SPDX-License-Identifier: LGPL-3.0-only
|
| 3 |
+
# della + live audit report by Naphula
|
| 4 |
+
|
| 5 |
+
import logging
|
| 6 |
+
from enum import Enum
|
| 7 |
+
from typing import Any, Dict, List, Optional, Tuple
|
| 8 |
+
|
| 9 |
+
import torch
|
| 10 |
+
from pydantic import BaseModel
|
| 11 |
+
from typing_extensions import Literal, override
|
| 12 |
+
|
| 13 |
+
from mergekit.architecture import WeightInfo
|
| 14 |
+
from mergekit.common import ImmutableMap, ModelReference
|
| 15 |
+
from mergekit.graph import Task
|
| 16 |
+
from mergekit.merge_methods.base import (
|
| 17 |
+
ConfigParameterDef,
|
| 18 |
+
MergeMethod,
|
| 19 |
+
MergeTensorInput,
|
| 20 |
+
)
|
| 21 |
+
from mergekit.sparsify import RescaleNorm, SparsificationMethod, sparsify
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
class ConsensusMethod(str, Enum):
|
| 25 |
+
count = "count"
|
| 26 |
+
sum = "sum"
|
| 27 |
+
|
| 28 |
+
|
| 29 |
+
class GeneralizedTaskArithmeticMerge(MergeMethod, BaseModel, frozen=True):
|
| 30 |
+
consensus_method: Optional[ConsensusMethod]
|
| 31 |
+
sparsification_method: Optional[SparsificationMethod]
|
| 32 |
+
default_normalize: bool
|
| 33 |
+
default_rescale: bool
|
| 34 |
+
method_name: str
|
| 35 |
+
method_pretty_name: Optional[str]
|
| 36 |
+
method_reference_url: Optional[str]
|
| 37 |
+
|
| 38 |
+
def name(self) -> str:
|
| 39 |
+
return self.method_name
|
| 40 |
+
|
| 41 |
+
@override
|
| 42 |
+
def pretty_name(self) -> Optional[str]:
|
| 43 |
+
return self.method_pretty_name
|
| 44 |
+
|
| 45 |
+
@override
|
| 46 |
+
def reference_url(self) -> Optional[str]:
|
| 47 |
+
return self.method_reference_url
|
| 48 |
+
|
| 49 |
+
def parameters(self) -> List[ConfigParameterDef]:
|
| 50 |
+
return [
|
| 51 |
+
ConfigParameterDef(name="int8_mask", required=False, default_value=False),
|
| 52 |
+
ConfigParameterDef(
|
| 53 |
+
name="normalize", required=False, default_value=self.default_normalize
|
| 54 |
+
),
|
| 55 |
+
ConfigParameterDef(
|
| 56 |
+
name="rescale", required=False, default_value=self.default_rescale
|
| 57 |
+
),
|
| 58 |
+
ConfigParameterDef(name="lambda", required=False, default_value=1.0),
|
| 59 |
+
]
|
| 60 |
+
|
| 61 |
+
def tensor_parameters(self) -> List[ConfigParameterDef]:
|
| 62 |
+
res = [
|
| 63 |
+
ConfigParameterDef(name="weight", required=True),
|
| 64 |
+
ConfigParameterDef(name="density", required=False, default_value=1.0),
|
| 65 |
+
]
|
| 66 |
+
if self.sparsification_method == SparsificationMethod.magnitude_outliers:
|
| 67 |
+
res.append(
|
| 68 |
+
ConfigParameterDef(
|
| 69 |
+
name="gamma",
|
| 70 |
+
default_value=0.01,
|
| 71 |
+
)
|
| 72 |
+
)
|
| 73 |
+
if self.sparsification_method == SparsificationMethod.della_magprune:
|
| 74 |
+
res.append(
|
| 75 |
+
ConfigParameterDef(
|
| 76 |
+
name="epsilon",
|
| 77 |
+
default_value=0.15,
|
| 78 |
+
)
|
| 79 |
+
)
|
| 80 |
+
return res
|
| 81 |
+
|
| 82 |
+
def make_task(
|
| 83 |
+
self,
|
| 84 |
+
output_weight: WeightInfo,
|
| 85 |
+
tensors: MergeTensorInput,
|
| 86 |
+
base_model: Optional[ModelReference],
|
| 87 |
+
parameters: ImmutableMap[str, Any],
|
| 88 |
+
tensor_parameters: ImmutableMap[ModelReference, ImmutableMap[str, Any]],
|
| 89 |
+
) -> Task:
|
| 90 |
+
return GTATask(
|
| 91 |
+
method=self,
|
| 92 |
+
tensors=tensors,
|
| 93 |
+
base_model=base_model,
|
| 94 |
+
tensor_parameters=tensor_parameters,
|
| 95 |
+
int8_mask=parameters["int8_mask"],
|
| 96 |
+
normalize=parameters["normalize"],
|
| 97 |
+
lambda_=parameters["lambda"],
|
| 98 |
+
rescale_norm=RescaleNorm.l1 if parameters["rescale"] else None,
|
| 99 |
+
weight_info=output_weight,
|
| 100 |
+
)
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
class GTATask(Task[torch.Tensor]):
|
| 104 |
+
method: GeneralizedTaskArithmeticMerge
|
| 105 |
+
tensors: MergeTensorInput
|
| 106 |
+
base_model: ModelReference
|
| 107 |
+
weight_info: WeightInfo
|
| 108 |
+
tensor_parameters: ImmutableMap[ModelReference, Any]
|
| 109 |
+
int8_mask: bool
|
| 110 |
+
normalize: bool
|
| 111 |
+
lambda_: float
|
| 112 |
+
rescale_norm: Optional[RescaleNorm]
|
| 113 |
+
|
| 114 |
+
def uses_accelerator(self) -> bool:
|
| 115 |
+
return True
|
| 116 |
+
|
| 117 |
+
def arguments(self) -> Dict[str, Task]:
|
| 118 |
+
return {"tensors": self.tensors}
|
| 119 |
+
|
| 120 |
+
def execute(
|
| 121 |
+
self,
|
| 122 |
+
tensors: Dict[ModelReference, torch.Tensor],
|
| 123 |
+
**_kwargs,
|
| 124 |
+
) -> torch.Tensor:
|
| 125 |
+
# collect task vectors
|
| 126 |
+
tvs, base = get_task_vectors(
|
| 127 |
+
self.weight_info,
|
| 128 |
+
self.base_model,
|
| 129 |
+
tensors,
|
| 130 |
+
tensor_parameters=self.tensor_parameters.data,
|
| 131 |
+
)
|
| 132 |
+
|
| 133 |
+
# --- LIVE AUDIT CHART ---
|
| 134 |
+
if tvs:
|
| 135 |
+
log_della_audit(
|
| 136 |
+
self.weight_info.name,
|
| 137 |
+
self.base_model,
|
| 138 |
+
tvs,
|
| 139 |
+
self.lambda_,
|
| 140 |
+
self.method.method_pretty_name
|
| 141 |
+
)
|
| 142 |
+
# ------------------------
|
| 143 |
+
|
| 144 |
+
if not tvs:
|
| 145 |
+
return base
|
| 146 |
+
|
| 147 |
+
# sparsify
|
| 148 |
+
if self.method.sparsification_method:
|
| 149 |
+
for tv_info in tvs:
|
| 150 |
+
kwargs = {}
|
| 151 |
+
if "gamma" in tv_info:
|
| 152 |
+
kwargs["gamma"] = tv_info["gamma"]
|
| 153 |
+
|
| 154 |
+
if "epsilon" in tv_info:
|
| 155 |
+
kwargs["epsilon"] = tv_info["epsilon"]
|
| 156 |
+
|
| 157 |
+
tv_info["delta"] = sparsify(
|
| 158 |
+
tv_info["delta"],
|
| 159 |
+
density=tv_info["density"],
|
| 160 |
+
method=self.method.sparsification_method,
|
| 161 |
+
rescale_norm=self.rescale_norm,
|
| 162 |
+
**kwargs,
|
| 163 |
+
)
|
| 164 |
+
|
| 165 |
+
deltas = torch.stack([tv["delta"] for tv in tvs], dim=0)
|
| 166 |
+
|
| 167 |
+
weights = torch.tensor(
|
| 168 |
+
[tv["weight"] for tv in tvs], dtype=deltas.dtype, device=deltas.device
|
| 169 |
+
)
|
| 170 |
+
while len(deltas.shape) > len(weights.shape):
|
| 171 |
+
weights.unsqueeze_(-1)
|
| 172 |
+
|
| 173 |
+
weighted_deltas = deltas * weights
|
| 174 |
+
|
| 175 |
+
# get sign consensus and mix deltas
|
| 176 |
+
if self.method.consensus_method:
|
| 177 |
+
mask_dtype = torch.int8 if self.int8_mask else base.dtype
|
| 178 |
+
mask = get_mask(
|
| 179 |
+
weighted_deltas,
|
| 180 |
+
method=self.method.consensus_method,
|
| 181 |
+
mask_dtype=mask_dtype,
|
| 182 |
+
)
|
| 183 |
+
mixed_delta = (weighted_deltas * mask).sum(dim=0)
|
| 184 |
+
divisor = (weights * mask).sum(dim=0)
|
| 185 |
+
divisor[divisor == 0] = 1
|
| 186 |
+
else:
|
| 187 |
+
mixed_delta = weighted_deltas.sum(dim=0)
|
| 188 |
+
divisor = weights.sum(dim=0)
|
| 189 |
+
divisor[divisor.abs() < 1e-8] = 1
|
| 190 |
+
|
| 191 |
+
if self.normalize:
|
| 192 |
+
mixed_delta /= divisor
|
| 193 |
+
|
| 194 |
+
if self.lambda_ != 1:
|
| 195 |
+
mixed_delta *= self.lambda_
|
| 196 |
+
|
| 197 |
+
return (base + mixed_delta).to(base.dtype)
|
| 198 |
+
|
| 199 |
+
def group_label(self) -> Optional[str]:
|
| 200 |
+
return self.tensors.group_label()
|
| 201 |
+
|
| 202 |
+
|
| 203 |
+
def get_task_vectors(
|
| 204 |
+
weight_info: WeightInfo,
|
| 205 |
+
base_model: ModelReference,
|
| 206 |
+
tensors: ImmutableMap[ModelReference, torch.Tensor],
|
| 207 |
+
tensor_parameters: ImmutableMap[ModelReference, ImmutableMap[str, Any]],
|
| 208 |
+
) -> Tuple[List[Dict[str, Any]], torch.Tensor]:
|
| 209 |
+
keys = list(tensors.keys())
|
| 210 |
+
base = tensors[base_model]
|
| 211 |
+
|
| 212 |
+
parameter_name = weight_info.name
|
| 213 |
+
|
| 214 |
+
res = []
|
| 215 |
+
for model in keys:
|
| 216 |
+
if model == base_model:
|
| 217 |
+
continue
|
| 218 |
+
|
| 219 |
+
x = tensors[model].to(base.dtype)
|
| 220 |
+
if x.shape != base.shape:
|
| 221 |
+
if weight_info.is_embed:
|
| 222 |
+
x = x[: base.shape[0], : base.shape[1]]
|
| 223 |
+
logging.warning(f"Using submatrix of {model}:{parameter_name}")
|
| 224 |
+
else:
|
| 225 |
+
logging.warning(
|
| 226 |
+
f"skipping {model}:{parameter_name} due to size mismatch"
|
| 227 |
+
)
|
| 228 |
+
continue
|
| 229 |
+
|
| 230 |
+
delta = x - base
|
| 231 |
+
del x
|
| 232 |
+
del tensors[model]
|
| 233 |
+
|
| 234 |
+
d = {}
|
| 235 |
+
d["model"] = model
|
| 236 |
+
d["delta"] = delta
|
| 237 |
+
for p in tensor_parameters[model]:
|
| 238 |
+
d[p] = tensor_parameters[model][p]
|
| 239 |
+
res.append(d)
|
| 240 |
+
return res, base
|
| 241 |
+
|
| 242 |
+
|
| 243 |
+
def get_mask(
|
| 244 |
+
delta: torch.Tensor,
|
| 245 |
+
method: Literal["sum", "count"] = "sum",
|
| 246 |
+
mask_dtype: Optional[torch.dtype] = None,
|
| 247 |
+
):
|
| 248 |
+
"""Returns a mask determining which delta vectors should be merged
|
| 249 |
+
into the final model.
|
| 250 |
+
|
| 251 |
+
For the methodology described in the TIES paper use 'sum'. For a
|
| 252 |
+
simpler naive count of signs, use 'count'."""
|
| 253 |
+
if mask_dtype is None:
|
| 254 |
+
mask_dtype = delta.dtype
|
| 255 |
+
|
| 256 |
+
sign = delta.sign().to(mask_dtype)
|
| 257 |
+
|
| 258 |
+
if method == "sum":
|
| 259 |
+
sign_weight = delta.sum(dim=0)
|
| 260 |
+
majority_sign = (sign_weight >= 0).to(mask_dtype) * 2 - 1
|
| 261 |
+
del sign_weight
|
| 262 |
+
elif method == "count":
|
| 263 |
+
majority_sign = (sign.sum(dim=0) >= 0).to(mask_dtype) * 2 - 1
|
| 264 |
+
else:
|
| 265 |
+
raise RuntimeError(f'Unimplemented mask method "{method}"')
|
| 266 |
+
|
| 267 |
+
return sign == majority_sign
|
| 268 |
+
|
| 269 |
+
|
| 270 |
+
def log_della_audit(
|
| 271 |
+
layer_name: str,
|
| 272 |
+
base_model: ModelReference,
|
| 273 |
+
tvs: List[Dict[str, Any]],
|
| 274 |
+
global_lambda: float,
|
| 275 |
+
method_name: str
|
| 276 |
+
):
|
| 277 |
+
"""Prints and saves a bar chart of DELLA/Task Arithmetic distribution based on actual Delta Norms."""
|
| 278 |
+
|
| 279 |
+
base_name = str(base_model.model.path).split("\\")[-1].split("/")[-1][:50]
|
| 280 |
+
|
| 281 |
+
bar_char = "█"
|
| 282 |
+
lines = [f"\n[{method_name} Audit] Layer: {layer_name} | Lambda={global_lambda:.2f}"]
|
| 283 |
+
lines.append(f" [BASE] {base_name:<50}")
|
| 284 |
+
|
| 285 |
+
# 1. Calculate stats
|
| 286 |
+
stats = []
|
| 287 |
+
total_impact = 0.0
|
| 288 |
+
|
| 289 |
+
for tv in tvs:
|
| 290 |
+
model_name = str(tv['model'].model.path).split("\\")[-1].split("/")[-1][:50]
|
| 291 |
+
weight = tv.get('weight', 0.0)
|
| 292 |
+
density = tv.get('density', 1.0)
|
| 293 |
+
epsilon = tv.get('epsilon', None)
|
| 294 |
+
delta = tv.get('delta', None)
|
| 295 |
+
|
| 296 |
+
norm = 0.0
|
| 297 |
+
if delta is not None:
|
| 298 |
+
# Use float32 for norm calculation to be safe
|
| 299 |
+
norm = torch.norm(delta.float()).item()
|
| 300 |
+
|
| 301 |
+
# Effective contribution magnitude = Weight * Norm
|
| 302 |
+
# This shows how much this model is actually moving the weights
|
| 303 |
+
impact = weight * norm
|
| 304 |
+
total_impact += impact
|
| 305 |
+
|
| 306 |
+
stats.append({
|
| 307 |
+
'name': model_name,
|
| 308 |
+
'weight': weight,
|
| 309 |
+
'density': density,
|
| 310 |
+
'epsilon': epsilon,
|
| 311 |
+
'norm': norm,
|
| 312 |
+
'impact': impact
|
| 313 |
+
})
|
| 314 |
+
|
| 315 |
+
# Sort by name for consistent logs
|
| 316 |
+
stats.sort(key=lambda x: x['name'])
|
| 317 |
+
|
| 318 |
+
# 2. Generate bars
|
| 319 |
+
for s in stats:
|
| 320 |
+
# Calculate percentage relative to the sum of all impacts (Share of Voice)
|
| 321 |
+
pct = (s['impact'] / total_impact * 100) if total_impact > 0 else 0.0
|
| 322 |
+
|
| 323 |
+
# Bar length (max 50 chars for 100%)
|
| 324 |
+
bar_len = int(max(0, min(50, pct / 2)))
|
| 325 |
+
bar = bar_char * bar_len
|
| 326 |
+
|
| 327 |
+
# Format info string
|
| 328 |
+
# W=Weight, D=Density, N=DeltaNorm
|
| 329 |
+
info = f"W:{s['weight']:.2f} D:{s['density']:.2f} N:{s['norm']:.2f}"
|
| 330 |
+
if s['epsilon'] is not None:
|
| 331 |
+
info += f" E:{s['epsilon']:.2f}"
|
| 332 |
+
|
| 333 |
+
lines.append(f" {s['name']:<50}: {bar:<50} {pct:5.1f}% ({info})")
|
| 334 |
+
|
| 335 |
+
log_entry = "\n".join(lines)
|
| 336 |
+
print(log_entry)
|
| 337 |
+
|
| 338 |
+
with open("della_audit.log", "a", encoding="utf-8") as f:
|
| 339 |
+
f.write(log_entry + "\n")
|
Audits/model_stock.py
ADDED
|
@@ -0,0 +1,183 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright (C) 2025 Arcee AI
|
| 2 |
+
# SPDX-License-Identifier: LGPL-3.0-only
|
| 3 |
+
# model_stock + live audit report by Naphula
|
| 4 |
+
|
| 5 |
+
import logging
|
| 6 |
+
import os
|
| 7 |
+
from typing import Any, Dict, List, Optional
|
| 8 |
+
|
| 9 |
+
import torch
|
| 10 |
+
from typing_extensions import override
|
| 11 |
+
|
| 12 |
+
from mergekit.architecture import WeightInfo
|
| 13 |
+
from mergekit.common import ImmutableMap, ModelReference
|
| 14 |
+
from mergekit.graph import Task
|
| 15 |
+
from mergekit.merge_methods.base import (
|
| 16 |
+
ConfigParameterDef,
|
| 17 |
+
MergeMethod,
|
| 18 |
+
MergeTensorInput,
|
| 19 |
+
)
|
| 20 |
+
from mergekit.merge_methods.rectify_embed import rectify_embed_sizes
|
| 21 |
+
|
| 22 |
+
|
| 23 |
+
class ModelStockMergeTask(Task[torch.Tensor]):
|
| 24 |
+
gather_tensors: MergeTensorInput
|
| 25 |
+
base_model: ModelReference
|
| 26 |
+
weight_info: WeightInfo
|
| 27 |
+
filter_wise: bool = False
|
| 28 |
+
|
| 29 |
+
def uses_accelerator(self) -> bool:
|
| 30 |
+
return True
|
| 31 |
+
|
| 32 |
+
def arguments(self) -> Dict[str, Task]:
|
| 33 |
+
return {"tensors": self.gather_tensors}
|
| 34 |
+
|
| 35 |
+
def execute(self, tensors: Dict[ModelReference, torch.Tensor]) -> torch.Tensor:
|
| 36 |
+
if len(tensors) == 1 and self.base_model in tensors:
|
| 37 |
+
return tensors[self.base_model]
|
| 38 |
+
if len(tensors) < 3:
|
| 39 |
+
if self.weight_info.optional:
|
| 40 |
+
logging.warning(
|
| 41 |
+
f"Optional weight {self.weight_info.name} not present in enough models, discarding"
|
| 42 |
+
)
|
| 43 |
+
return None
|
| 44 |
+
|
| 45 |
+
raise ValueError(
|
| 46 |
+
"ModelStockMerge requires at least 3 models (base plus two+ others)"
|
| 47 |
+
)
|
| 48 |
+
|
| 49 |
+
w_0, ws = self.get_rectified_weights(tensors)
|
| 50 |
+
out_shape = w_0.shape
|
| 51 |
+
|
| 52 |
+
if self.filter_wise:
|
| 53 |
+
if w_0.dim() == 1:
|
| 54 |
+
# bias (or other single-vector) parameters should be treated as row vectors
|
| 55 |
+
w_0 = w_0.unsqueeze(0)
|
| 56 |
+
ws = [w.unsqueeze(0) for w in ws]
|
| 57 |
+
else:
|
| 58 |
+
w_0 = w_0.view(-1)
|
| 59 |
+
ws = [w.view(-1) for w in ws]
|
| 60 |
+
|
| 61 |
+
offsets = [w - w_0 for w in ws]
|
| 62 |
+
|
| 63 |
+
# now there is a question of how to come up with a value for theta.
|
| 64 |
+
# in the two-vector case, we can get an exact angle between the two vectors
|
| 65 |
+
# but the paper doesn't explicitly say what to do in the multi-vector case -
|
| 66 |
+
# they keep using a singular theta value and don't elaborate on how to
|
| 67 |
+
# calculate it. i'm going to assume an average of pairwise angles for now? i guess?
|
| 68 |
+
|
| 69 |
+
cos_thetas = []
|
| 70 |
+
for i, w_0_offset in enumerate(offsets):
|
| 71 |
+
for j in range(i + 1, len(offsets)):
|
| 72 |
+
w_1_offset = offsets[j]
|
| 73 |
+
|
| 74 |
+
norm_product = torch.norm(w_0_offset, dim=-1) * torch.norm(
|
| 75 |
+
w_1_offset, dim=-1
|
| 76 |
+
)
|
| 77 |
+
cos_theta = (
|
| 78 |
+
(w_0_offset * w_1_offset).sum(dim=-1) / norm_product.clamp(min=1e-6)
|
| 79 |
+
).clamp(-1, 1)
|
| 80 |
+
cos_thetas.append(cos_theta)
|
| 81 |
+
|
| 82 |
+
cos_theta = torch.stack(cos_thetas).mean(dim=0).unsqueeze(-1)
|
| 83 |
+
N = len(ws)
|
| 84 |
+
t = (N * cos_theta) / (1 + (N - 1) * cos_theta)
|
| 85 |
+
|
| 86 |
+
# --- LIVE AUDIT CHART ---
|
| 87 |
+
t_scalar = t.mean().item()
|
| 88 |
+
base_name = str(self.base_model.model.path)
|
| 89 |
+
donor_names = [str(k.model.path) for k in tensors.keys() if k != self.base_model]
|
| 90 |
+
donor_names.sort() # Deterministic order
|
| 91 |
+
|
| 92 |
+
log_model_stock_audit(self.weight_info.name, t_scalar, base_name, donor_names)
|
| 93 |
+
# ------------------------
|
| 94 |
+
|
| 95 |
+
w_avg = sum(ws) / len(ws)
|
| 96 |
+
w_h = t * w_avg + (1 - t) * w_0
|
| 97 |
+
|
| 98 |
+
return w_h.reshape(out_shape)
|
| 99 |
+
|
| 100 |
+
def get_rectified_weights(self, tensors: Dict[ModelReference, torch.Tensor]):
|
| 101 |
+
if self.base_model not in tensors:
|
| 102 |
+
raise ValueError("Base model tensor not found")
|
| 103 |
+
|
| 104 |
+
all_weights = [tensors[self.base_model]] + [
|
| 105 |
+
tensors[k] for k in tensors if k != self.base_model
|
| 106 |
+
]
|
| 107 |
+
rectify_embed_sizes(self.weight_info, all_weights)
|
| 108 |
+
w_0 = all_weights[0]
|
| 109 |
+
ws = all_weights[1:]
|
| 110 |
+
return w_0, ws
|
| 111 |
+
|
| 112 |
+
def group_label(self) -> Optional[str]:
|
| 113 |
+
return self.gather_tensors.group_label()
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
class ModelStockMerge(MergeMethod):
|
| 117 |
+
def name(self) -> str:
|
| 118 |
+
return "model_stock"
|
| 119 |
+
|
| 120 |
+
@override
|
| 121 |
+
def pretty_name(self) -> Optional[str]:
|
| 122 |
+
return "Model Stock"
|
| 123 |
+
|
| 124 |
+
@override
|
| 125 |
+
def reference_url(self):
|
| 126 |
+
return "https://arxiv.org/abs/2403.19522"
|
| 127 |
+
|
| 128 |
+
def parameters(self) -> List[ConfigParameterDef]:
|
| 129 |
+
return [
|
| 130 |
+
ConfigParameterDef(name="filter_wise", required=False, default_value=False)
|
| 131 |
+
]
|
| 132 |
+
|
| 133 |
+
def make_task(
|
| 134 |
+
self,
|
| 135 |
+
*,
|
| 136 |
+
output_weight: WeightInfo,
|
| 137 |
+
tensors: MergeTensorInput,
|
| 138 |
+
base_model: Optional[ModelReference],
|
| 139 |
+
parameters: ImmutableMap[str, Any],
|
| 140 |
+
**_kwargs,
|
| 141 |
+
) -> Task:
|
| 142 |
+
return ModelStockMergeTask(
|
| 143 |
+
gather_tensors=tensors,
|
| 144 |
+
base_model=base_model,
|
| 145 |
+
weight_info=output_weight,
|
| 146 |
+
filter_wise=parameters["filter_wise"],
|
| 147 |
+
)
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
def log_model_stock_audit(layer_name: str, t_value: float, base_name: str, donor_names: List[str]):
|
| 151 |
+
"""Prints and saves a bar chart of Model Stock interpolation."""
|
| 152 |
+
# t is the weight of the average of donors.
|
| 153 |
+
# (1-t) is the weight of the base.
|
| 154 |
+
# Each donor gets t / len(donors).
|
| 155 |
+
|
| 156 |
+
n_donors = len(donor_names)
|
| 157 |
+
base_weight = 1.0 - t_value
|
| 158 |
+
donor_weight = t_value / n_donors if n_donors > 0 else 0.0
|
| 159 |
+
|
| 160 |
+
bar_char = "█"
|
| 161 |
+
lines = [f"\n[Model Stock Audit] Layer: {layer_name} | t={t_value:.4f}"]
|
| 162 |
+
|
| 163 |
+
# Base
|
| 164 |
+
pct = base_weight * 100
|
| 165 |
+
# Clamp bar length for visualization safety
|
| 166 |
+
bar_len = int(max(0, min(100, pct)) / 2)
|
| 167 |
+
bar = bar_char * bar_len
|
| 168 |
+
clean_base = base_name.split("\\")[-1].split("/")[-1][:60]
|
| 169 |
+
lines.append(f" {clean_base:<60}: {bar:<50} ({pct:6.2f}%)")
|
| 170 |
+
|
| 171 |
+
# Donors
|
| 172 |
+
for name in donor_names:
|
| 173 |
+
pct = donor_weight * 100
|
| 174 |
+
bar_len = int(max(0, min(100, pct)) / 2)
|
| 175 |
+
bar = bar_char * bar_len
|
| 176 |
+
clean_name = name.split("\\")[-1].split("/")[-1][:60]
|
| 177 |
+
lines.append(f" {clean_name:<60}: {bar:<50} ({pct:6.2f}%)")
|
| 178 |
+
|
| 179 |
+
log_entry = "\n".join(lines)
|
| 180 |
+
print(log_entry)
|
| 181 |
+
|
| 182 |
+
with open("model_stock_audit.log", "a", encoding="utf-8") as f:
|
| 183 |
+
f.write(log_entry + "\n")
|
model_tools.md
CHANGED
|
@@ -17,6 +17,18 @@ Tools to enhance LLM quantizations and merging
|
|
| 17 |
# config.py
|
| 18 |
- Simply replace line 13 | BEFORE `ScalarOrGradient: TypeAlias = Union[float, List[float]]` → AFTER `ScalarOrGradient: TypeAlias = Union[float, List[float], str, bool]` | to allow for custom filepath strings within parameter settings.
|
| 19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
# [metadata_audit.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/metadata_audit.py)
|
| 21 |
- Checks multiple models within subdirectories for vocab or rope mismatch (useful for large merges). Calibrated for Mistral Nemo 12B by default.
|
| 22 |
|
|
|
|
| 17 |
# config.py
|
| 18 |
- Simply replace line 13 | BEFORE `ScalarOrGradient: TypeAlias = Union[float, List[float]]` → AFTER `ScalarOrGradient: TypeAlias = Union[float, List[float], str, bool]` | to allow for custom filepath strings within parameter settings.
|
| 19 |
|
| 20 |
+
# [audit_della.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/audit_della.py)
|
| 21 |
+
- Audit the compatibility of donor models for `Della` merges before merging. See: [example chart Asmodeus](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Asmodeus_Audit.png), [example log Asmodeus](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Asmodeus_Audit.log), [example chart Slimaki](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Slimaki_Audit.png), [example log Slimaki](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Slimaki_Audit.log)
|
| 22 |
+
|
| 23 |
+
# [audit_karcher.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/audit_karcher.py)
|
| 24 |
+
- Audit the compatibility of donor models for `Karcher` merges before merging. See: [example chart Goetia](https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/nSuSM6v_BQBP4tAWK9rGQ.png)
|
| 25 |
+
|
| 26 |
+
# [generalized_task_arithmetic.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/generalized_task_arithmetic.py)
|
| 27 |
+
- Live audit reports of **actual contribution magnitude** on a per-layer basis for `Della` merges. See: [example audit Asmodeus](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Asmodeus_Live_Audit.png), [example audit Slimaki](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Slimaki_Live_Audit.png)
|
| 28 |
+
|
| 29 |
+
# [model_stock.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/model_stock.py)
|
| 30 |
+
- Live audit reports of **actual contribution magnitude** on a per-layer basis for `Model_Stock` merges.
|
| 31 |
+
|
| 32 |
# [metadata_audit.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/metadata_audit.py)
|
| 33 |
- Checks multiple models within subdirectories for vocab or rope mismatch (useful for large merges). Calibrated for Mistral Nemo 12B by default.
|
| 34 |
|