Spaces:

Naphula
/

model_tools

Running

App Files Files Community

Naphula commited on Feb 5

Commit

8c4f85f

verified ·

1 Parent(s): b472142

Upload 12 files

Browse files

Files changed (13) hide show

.gitattributes +3 -0
Audits/Asmodeus_Audit.log +56 -0
Audits/Asmodeus_Audit.png +0 -0
Audits/Asmodeus_Live_Audit.png +3 -0
Audits/Slimaki_Audit.log +36 -0
Audits/Slimaki_Audit.png +0 -0
Audits/Slimaki_Live_Audit.png +3 -0
Audits/Unreleased_2501_Della_Live_Audit.png +3 -0
Audits/audit_della.py +267 -0
Audits/audit_karcher.py +241 -0
Audits/generalized_task_arithmetic.py +339 -0
Audits/model_stock.py +183 -0
model_tools.md +12 -0

.gitattributes CHANGED Viewed

@@ -35,3 +35,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 gemma-2-9b-it.imatrix filter=lfs diff=lfs merge=lfs -text
 imatrix_unsloth.dat filter=lfs diff=lfs merge=lfs -text

 *tfevents* filter=lfs diff=lfs merge=lfs -text
 gemma-2-9b-it.imatrix filter=lfs diff=lfs merge=lfs -text
 imatrix_unsloth.dat filter=lfs diff=lfs merge=lfs -text
+Audits/Asmodeus_Live_Audit.png filter=lfs diff=lfs merge=lfs -text
+Audits/Slimaki_Live_Audit.png filter=lfs diff=lfs merge=lfs -text
+Audits/Unreleased_2501_Della_Live_Audit.png filter=lfs diff=lfs merge=lfs -text

Audits/Asmodeus_Audit.log ADDED Viewed

	@@ -0,0 +1,56 @@

+--- DELLA AUDIT V2 START ---
+Loading config: config.yaml
+Base Model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
+Donors: 17
+Extracting BASE MODEL fingerprint...
+Extracting DONOR fingerprints...
+Computing Task Vector geometry...
+================================================================================
+ID    | Model Name
+--------------------------------------------------------------------------------
+#1    | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
+#2    | TheDrummer--Cydonia-24B-v4.3
+#3    | ReadyArt--4.2.0-Broken-Tutu-24b
+#4    | zerofata--MS3.2-PaintedFantasy-v2-24B
+#5    | TheDrummer--Magidonia-24B-v4.3
+#6    | TheDrummer--Precog-24B-v1
+#7    | zerofata--MS3.2-PaintedFantasy-v3-24B
+#8    | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
+#9    | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
+#10   | trashpanda-org--MS3.2-24B-Mullein-v2
+#11   | LatitudeGames--Hearthfire-24B
+#12   | TheDrummer--Cydonia-24B-v4.2.0
+#13   | TheDrummer--Magidonia-24B-v4.2.0
+#14   | ConicCat--Mistral-Small-3.2-AntiRep-24B
+#15   | Undi95--MistralThinker-v1.1
+#16   | CrucibleLab--M3.2-24B-Loki-V2
+#17   | Darkhn--M3.2-24B-Animus-V7.1
+================================================================================
+--- MAGNITUDE ANALYSIS & DATA POINTS ---
+ID    | Status     | Delta Norm   | Orig Size    | Model Name
+----------------------------------------------------------------------------------------------------
+#1    | OK         | 0.0000       | 83886080     | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
+#2    | OK         | 1.2955       | 83886080     | TheDrummer--Cydonia-24B-v4.3
+#3    | HIGH MAG   | 46.6745      | 83886080     | ReadyArt--4.2.0-Broken-Tutu-24b
+#4    | OK         | 0.0505       | 83886080     | zerofata--MS3.2-PaintedFantasy-v2-24B
+#5    | OK         | 4.5662       | 83886080     | TheDrummer--Magidonia-24B-v4.3
+#6    | OK         | 4.0883       | 83886080     | TheDrummer--Precog-24B-v1
+#7    | OK         | 4.8187       | 83886080     | zerofata--MS3.2-PaintedFantasy-v3-24B
+#8    | OK         | 1.9250       | 83886080     | !BeaverAI_Fallen-Mistral-Small-3.1-24B-v1e_textonly
+#9    | HIGH MAG   | 47.3140      | 83886080     | ReadyArt--Broken-Tutu-24B-Transgression-v2.0
+#10   | OK         | 0.1586       | 83886080     | trashpanda-org--MS3.2-24B-Mullein-v2
+#11   | OK         | 1.9367       | 83886080     | LatitudeGames--Hearthfire-24B
+#12   | OK         | 1.0936       | 83886080     | TheDrummer--Cydonia-24B-v4.2.0
+#13   | OK         | 3.9147       | 83886080     | TheDrummer--Magidonia-24B-v4.2.0
+#14   | OK         | 0.0164       | 83886080     | ConicCat--Mistral-Small-3.2-AntiRep-24B
+#15   | OK         | 11.4846      | 83886080     | Undi95--MistralThinker-v1.1
+#16   | OK         | 3.1101       | 83886080     | CrucibleLab--M3.2-24B-Loki-V2
+#17   | OK         | 0.7205       | 83886080     | Darkhn--M3.2-24B-Animus-V7.1
+Log saved to: della_scan.log
+Displaying charts...

Audits/Asmodeus_Audit.png ADDED Viewed

Audits/Asmodeus_Live_Audit.png ADDED Viewed

Git LFS Details

SHA256: 3a657ab2bc2dcbedfb7c42460705989e25d03bc3e2692fb8be0d4f2596ef27c8
Pointer size: 131 Bytes
Size of remote file: 214 kB

Audits/Slimaki_Audit.log ADDED Viewed

	@@ -0,0 +1,36 @@

+--- DELLA AUDIT V2 START ---
+Loading config: config.yaml
+Base Model: B:\24B\!models--anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
+Donors: 7
+Extracting BASE MODEL fingerprint...
+Extracting DONOR fingerprints...
+Computing Task Vector geometry...
+================================================================================
+ID    | Model Name
+--------------------------------------------------------------------------------
+#1    | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
+#2    | TheDrummer--Cydonia-24B-v4.3
+#3    | ReadyArt--4.2.0-Broken-Tutu-24b
+#4    | zerofata--MS3.2-PaintedFantasy-v2-24B
+#5    | TheDrummer--Magidonia-24B-v4.3
+#6    | TheDrummer--Precog-24B-v1
+#7    | zerofata--MS3.2-PaintedFantasy-v3-24B
+================================================================================
+--- MAGNITUDE ANALYSIS & DATA POINTS ---
+ID    | Status     | Delta Norm   | Orig Size    | Model Name
+----------------------------------------------------------------------------------------------------
+#1    | OK         | 0.0000       | 83886080     | anthracite-core--Mistral-Small-3.2-24B-Instruct-2506-Text-Only
+#2    | OK         | 1.2955       | 83886080     | TheDrummer--Cydonia-24B-v4.3
+#3    | HIGH MAG   | 46.6745      | 83886080     | ReadyArt--4.2.0-Broken-Tutu-24b
+#4    | OK         | 0.0505       | 83886080     | zerofata--MS3.2-PaintedFantasy-v2-24B
+#5    | OK         | 4.5662       | 83886080     | TheDrummer--Magidonia-24B-v4.3
+#6    | OK         | 4.0883       | 83886080     | TheDrummer--Precog-24B-v1
+#7    | OK         | 4.8187       | 83886080     | zerofata--MS3.2-PaintedFantasy-v3-24B
+Log saved to: della_scan.log
+Displaying charts...

Audits/Slimaki_Audit.png ADDED Viewed

Audits/Slimaki_Live_Audit.png ADDED Viewed

Git LFS Details

SHA256: b448bff5aad46c8af9f9dcc8590b1b98bfaaf1c53d97fdf1685aef0a72d0ee51
Pointer size: 131 Bytes
Size of remote file: 120 kB

Audits/Unreleased_2501_Della_Live_Audit.png ADDED Viewed

Git LFS Details

SHA256: 44b089a762e7782ebb3833568c0ec77dfc4552d0d0890c4542c0ce8a13c88d5f
Pointer size: 131 Bytes
Size of remote file: 118 kB

Audits/audit_della.py ADDED Viewed

	@@ -0,0 +1,267 @@

+import yaml
+import torch
+import os
+import sys
+import numpy as np
+import matplotlib.pyplot as plt
+from safetensors import safe_open
+from sklearn.decomposition import PCA
+from sklearn.metrics.pairwise import cosine_similarity
+from tqdm import tqdm
+import argparse
+# --- CONFIGURATION ---
+PROBE_LAYERS = [
+    "model.layers.12.mlp.down_proj.weight", # Mid-model logic
+    "lm_head.weight"                        # Output semantics
+]
+LOG_FILENAME = "della_scan.log"
+# ---------------------
+class Logger:
+    def __init__(self, filename):
+        self.terminal = sys.stdout
+        self.log = open(filename, "w", encoding="utf-8")
+    def write(self, message):
+        self.terminal.write(message)
+        self.log.write(message)
+        self.log.flush()
+    def flush(self):
+        self.terminal.flush()
+        self.log.flush()
+    def close(self):
+        self.log.close()
+def load_yaml_config(config_path):
+    print(f"Loading config: {config_path}")
+    with open(config_path, 'r', encoding='utf-8') as f:
+        config = yaml.safe_load(f)
+    models = []
+    base_model = None
+    # Extract base model
+    if 'base_model' in config:
+        base_model = config['base_model']
+    # Extract models list
+    if 'models' in config:
+        for m in config['models']:
+            models.append(m['model'])
+    return base_model, models
+def get_model_fingerprint(model_path, probe_layers):
+    tensors = []
+    if os.path.exists(model_path):
+        files = [f for f in os.listdir(model_path) if f.endswith('.safetensors')]
+        files.sort()
+        found_layers = 0
+        for file in files:
+            full_path = os.path.join(model_path, file)
+            try:
+                with safe_open(full_path, framework="pt", device="cpu") as f:
+                    keys = f.keys()
+                    for layer in probe_layers:
+                        if layer in keys:
+                            t = f.get_tensor(layer).float().view(-1)
+                            t = t[::10] # Downsample
+                            tensors.append(t)
+                            found_layers += 1
+            except Exception as e:
+                print(f"Error reading {file}: {e}")
+        if found_layers == 0:
+            return None
+    else:
+        return None
+    if not tensors:
+        return None
+    return torch.cat(tensors)
+def analyze_task_vectors(base_fp, donor_fps):
+    # 0. Handle size mismatches (Manifold Alignment)
+    base_size = base_fp.numel()
+    donor_sizes = [f.numel() for f in donor_fps]
+    min_size = min([base_size] + donor_sizes)
+    if any(s != min_size for s in donor_sizes) or base_size != min_size:
+        print(f"\n[!] SIZE MISMATCH DETECTED")
+        print(f"    Base Size: {base_size}")
+        print(f"    Min Donor: {min(donor_sizes)}")
+        print(f"    Action: Truncating all models to {min_size} for audit.")
+    # Align fingerprints
+    aligned_base = base_fp[:min_size]
+    aligned_donors = [f[:min_size] for f in donor_fps]
+    # 1. Calculate Task Vectors (Delta = Donor - Base)
+    task_vectors = []
+    for d_fp in aligned_donors:
+        task_vectors.append(d_fp - aligned_base)
+    # Stack into matrix [N_donors, N_features]
+    data_matrix = torch.stack(task_vectors).numpy()
+    # 2. Norm Analysis (Magnitude of the Delta)
+    norms = np.linalg.norm(data_matrix, axis=1)
+    # 3. Cosine Similarity Matrix (Directional Alignment)
+    cos_sim = cosine_similarity(data_matrix)
+    # 4. PCA Projection (2D)
+    # Center the task vectors
+    centered_data = data_matrix - np.mean(data_matrix, axis=0)
+    if len(donor_fps) > 1:
+        pca = PCA(n_components=2)
+        coords = pca.fit_transform(centered_data)
+        var_ratio = pca.explained_variance_ratio_
+    else:
+        coords = np.zeros((1, 2))
+        var_ratio = [1.0, 0.0]
+    return norms, cos_sim, coords, var_ratio, donor_sizes
+def plot_results(model_ids, norms, cos_sim, coords, var_ratio):
+    labels = [str(mid) for mid in model_ids]
+    fig = plt.figure(figsize=(20, 12))
+    fig.suptitle(f"DELLA/Task Arithmetic Compatibility Audit ({len(model_ids)} Donors)\nRefer to della_scan.log for ID Key", fontsize=16)
+    # --- Plot 1: Task Vector Manifold (PCA) ---
+    ax1 = fig.add_subplot(2, 2, 1)
+    ax1.scatter(coords[:, 0], coords[:, 1], c='purple', s=80, alpha=0.6)
+    for i, txt in enumerate(labels):
+        ax1.annotate(txt, (coords[i, 0], coords[i, 1]), xytext=(3, 3), textcoords='offset points', fontsize=8, fontweight='bold')
+    ax1.set_title(f"Task Vector Map (PCA of Deltas)\nClusters = Redundant Skills")
+    ax1.set_xlabel(f"PC1 ({var_ratio[0]:.1%} variance)")
+    ax1.set_ylabel(f"PC2 ({var_ratio[1]:.1%} variance)")
+    ax1.grid(True, alpha=0.3)
+    # Plot Origin (Base Model reference relative to centered data)
+    center_offset = -np.mean(coords, axis=0)
+    ax1.scatter(center_offset[0], center_offset[1], c='red', marker='x', s=100, label='Base Model (Ref)')
+    ax1.legend()
+    # --- Plot 2: Cosine Similarity Heatmap ---
+    ax2 = fig.add_subplot(2, 2, 2)
+    # For Task Vectors, negative similarity is common (conflicting directions)
+    im = ax2.imshow(cos_sim, cmap='coolwarm', vmin=-1.0, vmax=1.0)
+    ax2.set_xticks(np.arange(len(labels)))
+    ax2.set_yticks(np.arange(len(labels)))
+    ax2.set_xticklabels(labels, rotation=90, fontsize=6)
+    ax2.set_yticklabels(labels, fontsize=6)
+    ax2.set_title("Task Vector Alignment (Blue=Opposed, Red=Aligned)")
+    plt.colorbar(im, ax=ax2)
+    # --- Plot 3: Delta Magnitude (L2 Norm) ---
+    ax3 = fig.add_subplot(2, 1, 2)
+    bars = ax3.bar(labels, norms, color='orange', alpha=0.6)
+    ax3.set_title("Task Vector Magnitude (L2 Norm)\nHigh bars = Drastic deviation from Base Model")
+    ax3.set_ylabel("Delta L2 Norm")
+    ax3.set_xlabel("Donor ID")
+    ax3.grid(axis='y', alpha=0.3)
+    for bar in bars:
+        height = bar.get_height()
+        ax3.text(bar.get_x() + bar.get_width()/2., height,
+                f'{height:.1f}', ha='center', va='bottom', fontsize=6, rotation=90)
+    plt.tight_layout()
+    plt.show()
+def main():
+    # Hook stdout to log file
+    sys.stdout = Logger(LOG_FILENAME)
+    parser = argparse.ArgumentParser(description="Audit MergeKit models for DELLA/Task Arithmetic compatibility.")
+    parser.add_argument("config", help="Path to the mergekit yaml config file")
+    args = parser.parse_args()
+    print(f"--- DELLA AUDIT V2 START ---")
+    base_model_path, donor_paths = load_yaml_config(args.config)
+    if not base_model_path:
+        print("Error: No 'base_model' found in config. DELLA requires a base model.")
+        return
+    print(f"Base Model: {base_model_path}")
+    print(f"Donors: {len(donor_paths)}")
+    print("\nExtracting BASE MODEL fingerprint...")
+    base_fp = get_model_fingerprint(base_model_path, PROBE_LAYERS)
+    if base_fp is None:
+        print("Failed to load base model. Exiting.")
+        return
+    donor_fps = []
+    valid_donors = []
+    valid_ids = []
+    print("\nExtracting DONOR fingerprints...")
+    for i, path in enumerate(tqdm(donor_paths)):
+        fp = get_model_fingerprint(path, PROBE_LAYERS)
+        if fp is not None:
+            donor_fps.append(fp)
+            valid_donors.append(path)
+            valid_ids.append(i + 1)
+        else:
+            print(f"Skipping {path} (failed to load)")
+    if len(valid_donors) < 1:
+        print("Need at least 1 valid donor.")
+        return
+    print("\nComputing Task Vector geometry...")
+    norms, cos_sim, coords, var_ratio, sizes = analyze_task_vectors(base_fp, donor_fps)
+    # --- LOGGING THE KEY ---
+    print("\n" + "="*80)
+    print(f"{'ID':<5} | {'Model Name'}")
+    print("-" * 80)
+    for i, path in enumerate(valid_donors):
+        name = os.path.basename(path).replace("!models--", "")
+        print(f"#{valid_ids[i]:<4} | {name}")
+    print("="*80 + "\n")
+    # --- MAGNITUDE ANALYSIS ---
+    print("--- MAGNITUDE ANALYSIS & DATA POINTS ---")
+    print(f"{'ID':<5} | {'Status':<10} | {'Delta Norm':<12} | {'Orig Size':<12} | {'Model Name'}")
+    print("-" * 100)
+    mean_norm = np.mean(norms)
+    std_norm = np.std(norms)
+    for i, model in enumerate(valid_donors):
+        name = os.path.basename(model).replace("!models--", "")
+        # Check if norm is significantly higher than average (potential destroyer of weights)
+        z_score = (norms[i] - mean_norm) / (std_norm + 1e-8)
+        status = "HIGH MAG" if z_score > 1.5 else "OK"
+        print(f"#{valid_ids[i]:<4} | {status:<10} | {norms[i]:<12.4f} | {sizes[i]:<12} | {name}")
+    print("\nLog saved to: " + LOG_FILENAME)
+    print("Displaying charts...")
+    # Reset stdout
+    sys.stdout.terminal.flush()
+    plot_results(valid_ids, norms, cos_sim, coords, var_ratio)
+    # Close log
+    sys.stdout.close()
+if __name__ == "__main__":
+    main()

Audits/audit_karcher.py ADDED Viewed

	@@ -0,0 +1,241 @@

+import yaml
+import torch
+import os
+import sys
+import numpy as np
+import matplotlib.pyplot as plt
+from safetensors import safe_open
+from sklearn.decomposition import PCA
+from sklearn.metrics.pairwise import cosine_similarity
+from tqdm import tqdm
+import argparse
+# --- CONFIGURATION ---
+PROBE_LAYERS = [
+    "model.layers.12.mlp.down_proj.weight", # Mid-model logic
+    "lm_head.weight"                        # Output semantics
+]
+LOG_FILENAME = "karcher_scan.log"
+# ---------------------
+class Logger:
+    def __init__(self, filename):
+        self.terminal = sys.stdout
+        self.log = open(filename, "w", encoding="utf-8")
+    def write(self, message):
+        self.terminal.write(message)
+        self.log.write(message)
+        self.log.flush()
+    def flush(self):
+        self.terminal.flush()
+        self.log.flush()
+    def close(self):
+        self.log.close()
+def load_yaml_config(config_path):
+    print(f"Loading config: {config_path}")
+    with open(config_path, 'r', encoding='utf-8') as f:
+        config = yaml.safe_load(f)
+    models = []
+    if 'models' in config:
+        for m in config['models']:
+            models.append(m['model'])
+    return models
+def get_model_fingerprint(model_path, probe_layers):
+    tensors = []
+    if os.path.exists(model_path):
+        files = [f for f in os.listdir(model_path) if f.endswith('.safetensors')]
+        files.sort()
+        found_layers = 0
+        for file in files:
+            full_path = os.path.join(model_path, file)
+            try:
+                with safe_open(full_path, framework="pt", device="cpu") as f:
+                    keys = f.keys()
+                    for layer in probe_layers:
+                        if layer in keys:
+                            t = f.get_tensor(layer).float().view(-1)
+                            t = t[::10] # Downsample
+                            tensors.append(t)
+                            found_layers += 1
+            except Exception as e:
+                print(f"Error reading {file}: {e}")
+        if found_layers == 0:
+            return None
+    else:
+        return None
+    if not tensors:
+        return None
+    return torch.cat(tensors)
+def analyze_compatibility(fingerprints):
+    # 0. Handle size mismatches (Manifold Alignment)
+    # We keep 'sizes' as the ORIGINAL sizes for logging
+    sizes = [f.numel() for f in fingerprints]
+    min_size = min(sizes)
+    max_size = max(sizes)
+    if min_size != max_size:
+        print(f"\n[!] SIZE MISMATCH DETECTED")
+        print(f"    Smallest Fingerprint: {min_size}")
+        print(f"    Largest Fingerprint:  {max_size}")
+        print(f"    Action: Truncating all models to {min_size} for alignment.")
+    # Align all fingerprints to the smallest common denominator
+    aligned_fingerprints = [f[:min_size] for f in fingerprints]
+    # Stack into matrix [N_models, N_features]
+    data_matrix = torch.stack(aligned_fingerprints).numpy()
+    # 1. Norm Analysis (Magnitude)
+    norms = np.linalg.norm(data_matrix, axis=1)
+    # 2. Cosine Similarity Matrix
+    cos_sim = cosine_similarity(data_matrix)
+    # 3. PCA Projection (2D)
+    centered_data = data_matrix - np.mean(data_matrix, axis=0)
+    pca = PCA(n_components=2)
+    coords = pca.fit_transform(centered_data)
+    return norms, cos_sim, coords, pca.explained_variance_ratio_, sizes
+def plot_results(model_ids, norms, cos_sim, coords, var_ratio):
+    # Use IDs for plotting
+    labels = [str(mid) for mid in model_ids]
+    fig = plt.figure(figsize=(20, 12))
+    fig.suptitle(f"Karcher Merge Compatibility Audit ({len(model_ids)} Models)\nRefer to karcher_scan.log for ID Key", fontsize=16)
+    # --- Plot 1: PCA Manifold Map ---
+    ax1 = fig.add_subplot(2, 2, 1)
+    ax1.scatter(coords[:, 0], coords[:, 1], c='blue', s=80, alpha=0.6)
+    # Annotate points with IDs
+    for i, txt in enumerate(labels):
+        ax1.annotate(txt, (coords[i, 0], coords[i, 1]), xytext=(3, 3), textcoords='offset points', fontsize=8, fontweight='bold')
+    ax1.set_title(f"Manifold Map (PCA)\nOutliers here will break the merge")
+    ax1.set_xlabel(f"PC1 ({var_ratio[0]:.1%} variance)")
+    ax1.set_ylabel(f"PC2 ({var_ratio[1]:.1%} variance)")
+    ax1.grid(True, alpha=0.3)
+    # Draw center
+    center = np.mean(coords, axis=0)
+    ax1.scatter(center[0], center[1], c='red', marker='x', s=100, label='Center')
+    # --- Plot 2: Cosine Similarity Heatmap ---
+    ax2 = fig.add_subplot(2, 2, 2)
+    im = ax2.imshow(cos_sim, cmap='viridis', vmin=0.8, vmax=1.0)
+    # Set ticks to IDs
+    ax2.set_xticks(np.arange(len(labels)))
+    ax2.set_yticks(np.arange(len(labels)))
+    ax2.set_xticklabels(labels, rotation=90, fontsize=6)
+    ax2.set_yticklabels(labels, fontsize=6)
+    ax2.set_title("Cosine Similarity (Red/Yellow = Compatible)")
+    plt.colorbar(im, ax=ax2)
+    # --- Plot 3: Weight Magnitude (Norms) ---
+    ax3 = fig.add_subplot(2, 1, 2)
+    bars = ax3.bar(labels, norms, color='green', alpha=0.6)
+    ax3.set_title("Weight Magnitude (L2 Norm)\nKarcher is sensitive to large differences here")
+    ax3.set_ylabel("L2 Norm")
+    ax3.set_xlabel("Model ID")
+    ax3.grid(axis='y', alpha=0.3)
+    # Add value labels (rotated if many models)
+    for bar in bars:
+        height = bar.get_height()
+        ax3.text(bar.get_x() + bar.get_width()/2., height,
+                f'{height:.1f}', ha='center', va='bottom', fontsize=6, rotation=90)
+    plt.tight_layout()
+    plt.show()
+def main():
+    # Hook stdout to log file
+    sys.stdout = Logger(LOG_FILENAME)
+    parser = argparse.ArgumentParser(description="Audit MergeKit models for Karcher compatibility.")
+    parser.add_argument("config", help="Path to the mergekit yaml config file")
+    args = parser.parse_args()
+    print(f"--- KARCHER AUDIT V4 START ---")
+    model_paths = load_yaml_config(args.config)
+    print(f"Found {len(model_paths)} models.")
+    fingerprints = []
+    valid_models = []
+    valid_ids = []
+    print("Extracting model fingerprints...")
+    # We use a manual counter for IDs to keep them sequential based on config order
+    for i, path in enumerate(tqdm(model_paths)):
+        fp = get_model_fingerprint(path, PROBE_LAYERS)
+        if fp is not None:
+            fingerprints.append(fp)
+            valid_models.append(path)
+            valid_ids.append(i + 1) # 1-based indexing
+        else:
+            print(f"Skipping {path} (failed to load)")
+    if len(valid_models) < 2:
+        print("Need at least 2 valid models to compare.")
+        return
+    print("Computing manifold geometry...")
+    norms, cos_sim, coords, var_ratio, sizes = analyze_compatibility(fingerprints)
+    # --- LOGGING THE KEY ---
+    print("\n" + "="*80)
+    print(f"{'ID':<5} | {'Model Name'}")
+    print("-" * 80)
+    for i, path in enumerate(valid_models):
+        name = os.path.basename(path).replace("!models--", "")
+        print(f"#{valid_ids[i]:<4} | {name}")
+    print("="*80 + "\n")
+    # --- OUTLIER ANALYSIS ---
+    print("--- OUTLIER ANALYSIS & DATA POINTS ---")
+    print(f"{'ID':<5} | {'Status':<10} | {'Dist':<10} | {'Norm':<10} | {'Orig Size':<12} | {'Model Name'}")
+    print("-" * 100)
+    centroid = np.mean(coords, axis=0)
+    distances = np.linalg.norm(coords - centroid, axis=1)
+    mean_dist = np.mean(distances)
+    std_dist = np.std(distances)
+    z_scores = (distances - mean_dist) / (std_dist + 1e-8)
+    for i, model in enumerate(valid_models):
+        name = os.path.basename(model).replace("!models--", "")
+        is_outlier = z_scores[i] > 1.5
+        status = "OUTLIER" if is_outlier else "OK"
+        # Log format: ID | Status | Dist | Norm | Size | Name
+        print(f"#{valid_ids[i]:<4} | {status:<10} | {distances[i]:<10.4f} | {norms[i]:<10.4f} | {sizes[i]:<12} | {name}")
+    print("\nLog saved to: " + LOG_FILENAME)
+    print("Displaying charts...")
+    # Reset stdout so matplotlib doesn't try to write binary image data to our text logger if it crashes
+    sys.stdout.terminal.flush()
+    plot_results(valid_ids, norms, cos_sim, coords, var_ratio)
+    # Close log
+    sys.stdout.close()
+if __name__ == "__main__":
+    main()

Audits/generalized_task_arithmetic.py ADDED Viewed

	@@ -0,0 +1,339 @@

+# Copyright (C) 2025 Arcee AI
+# SPDX-License-Identifier: LGPL-3.0-only
+# della + live audit report by Naphula
+import logging
+from enum import Enum
+from typing import Any, Dict, List, Optional, Tuple
+import torch
+from pydantic import BaseModel
+from typing_extensions import Literal, override
+from mergekit.architecture import WeightInfo
+from mergekit.common import ImmutableMap, ModelReference
+from mergekit.graph import Task
+from mergekit.merge_methods.base import (
+    ConfigParameterDef,
+    MergeMethod,
+    MergeTensorInput,
+)
+from mergekit.sparsify import RescaleNorm, SparsificationMethod, sparsify
+class ConsensusMethod(str, Enum):
+    count = "count"
+    sum = "sum"
+class GeneralizedTaskArithmeticMerge(MergeMethod, BaseModel, frozen=True):
+    consensus_method: Optional[ConsensusMethod]
+    sparsification_method: Optional[SparsificationMethod]
+    default_normalize: bool
+    default_rescale: bool
+    method_name: str
+    method_pretty_name: Optional[str]
+    method_reference_url: Optional[str]
+    def name(self) -> str:
+        return self.method_name
+    @override
+    def pretty_name(self) -> Optional[str]:
+        return self.method_pretty_name
+    @override
+    def reference_url(self) -> Optional[str]:
+        return self.method_reference_url
+    def parameters(self) -> List[ConfigParameterDef]:
+        return [
+            ConfigParameterDef(name="int8_mask", required=False, default_value=False),
+            ConfigParameterDef(
+                name="normalize", required=False, default_value=self.default_normalize
+            ),
+            ConfigParameterDef(
+                name="rescale", required=False, default_value=self.default_rescale
+            ),
+            ConfigParameterDef(name="lambda", required=False, default_value=1.0),
+        ]
+    def tensor_parameters(self) -> List[ConfigParameterDef]:
+        res = [
+            ConfigParameterDef(name="weight", required=True),
+            ConfigParameterDef(name="density", required=False, default_value=1.0),
+        ]
+        if self.sparsification_method == SparsificationMethod.magnitude_outliers:
+            res.append(
+                ConfigParameterDef(
+                    name="gamma",
+                    default_value=0.01,
+                )
+            )
+        if self.sparsification_method == SparsificationMethod.della_magprune:
+            res.append(
+                ConfigParameterDef(
+                    name="epsilon",
+                    default_value=0.15,
+                )
+            )
+        return res
+    def make_task(
+        self,
+        output_weight: WeightInfo,
+        tensors: MergeTensorInput,
+        base_model: Optional[ModelReference],
+        parameters: ImmutableMap[str, Any],
+        tensor_parameters: ImmutableMap[ModelReference, ImmutableMap[str, Any]],
+    ) -> Task:
+        return GTATask(
+            method=self,
+            tensors=tensors,
+            base_model=base_model,
+            tensor_parameters=tensor_parameters,
+            int8_mask=parameters["int8_mask"],
+            normalize=parameters["normalize"],
+            lambda_=parameters["lambda"],
+            rescale_norm=RescaleNorm.l1 if parameters["rescale"] else None,
+            weight_info=output_weight,
+        )
+class GTATask(Task[torch.Tensor]):
+    method: GeneralizedTaskArithmeticMerge
+    tensors: MergeTensorInput
+    base_model: ModelReference
+    weight_info: WeightInfo
+    tensor_parameters: ImmutableMap[ModelReference, Any]
+    int8_mask: bool
+    normalize: bool
+    lambda_: float
+    rescale_norm: Optional[RescaleNorm]
+    def uses_accelerator(self) -> bool:
+        return True
+    def arguments(self) -> Dict[str, Task]:
+        return {"tensors": self.tensors}
+    def execute(
+        self,
+        tensors: Dict[ModelReference, torch.Tensor],
+        **_kwargs,
+    ) -> torch.Tensor:
+        # collect task vectors
+        tvs, base = get_task_vectors(
+            self.weight_info,
+            self.base_model,
+            tensors,
+            tensor_parameters=self.tensor_parameters.data,
+        )
+        # --- LIVE AUDIT CHART ---
+        if tvs:
+            log_della_audit(
+                self.weight_info.name,
+                self.base_model,
+                tvs,
+                self.lambda_,
+                self.method.method_pretty_name
+            )
+        # ------------------------
+        if not tvs:
+            return base
+        # sparsify
+        if self.method.sparsification_method:
+            for tv_info in tvs:
+                kwargs = {}
+                if "gamma" in tv_info:
+                    kwargs["gamma"] = tv_info["gamma"]
+                if "epsilon" in tv_info:
+                    kwargs["epsilon"] = tv_info["epsilon"]
+                tv_info["delta"] = sparsify(
+                    tv_info["delta"],
+                    density=tv_info["density"],
+                    method=self.method.sparsification_method,
+                    rescale_norm=self.rescale_norm,
+                    **kwargs,
+                )
+        deltas = torch.stack([tv["delta"] for tv in tvs], dim=0)
+        weights = torch.tensor(
+            [tv["weight"] for tv in tvs], dtype=deltas.dtype, device=deltas.device
+        )
+        while len(deltas.shape) > len(weights.shape):
+            weights.unsqueeze_(-1)
+        weighted_deltas = deltas * weights
+        # get sign consensus and mix deltas
+        if self.method.consensus_method:
+            mask_dtype = torch.int8 if self.int8_mask else base.dtype
+            mask = get_mask(
+                weighted_deltas,
+                method=self.method.consensus_method,
+                mask_dtype=mask_dtype,
+            )
+            mixed_delta = (weighted_deltas * mask).sum(dim=0)
+            divisor = (weights * mask).sum(dim=0)
+            divisor[divisor == 0] = 1
+        else:
+            mixed_delta = weighted_deltas.sum(dim=0)
+            divisor = weights.sum(dim=0)
+            divisor[divisor.abs() < 1e-8] = 1
+        if self.normalize:
+            mixed_delta /= divisor
+        if self.lambda_ != 1:
+            mixed_delta *= self.lambda_
+        return (base + mixed_delta).to(base.dtype)
+    def group_label(self) -> Optional[str]:
+        return self.tensors.group_label()
+def get_task_vectors(
+    weight_info: WeightInfo,
+    base_model: ModelReference,
+    tensors: ImmutableMap[ModelReference, torch.Tensor],
+    tensor_parameters: ImmutableMap[ModelReference, ImmutableMap[str, Any]],
+) -> Tuple[List[Dict[str, Any]], torch.Tensor]:
+    keys = list(tensors.keys())
+    base = tensors[base_model]
+    parameter_name = weight_info.name
+    res = []
+    for model in keys:
+        if model == base_model:
+            continue
+        x = tensors[model].to(base.dtype)
+        if x.shape != base.shape:
+            if weight_info.is_embed:
+                x = x[: base.shape[0], : base.shape[1]]
+                logging.warning(f"Using submatrix of {model}:{parameter_name}")
+            else:
+                logging.warning(
+                    f"skipping {model}:{parameter_name} due to size mismatch"
+                )
+                continue
+        delta = x - base
+        del x
+        del tensors[model]
+        d = {}
+        d["model"] = model
+        d["delta"] = delta
+        for p in tensor_parameters[model]:
+            d[p] = tensor_parameters[model][p]
+        res.append(d)
+    return res, base
+def get_mask(
+    delta: torch.Tensor,
+    method: Literal["sum", "count"] = "sum",
+    mask_dtype: Optional[torch.dtype] = None,
+):
+    """Returns a mask determining which delta vectors should be merged
+    into the final model.
+    For the methodology described in the TIES paper use 'sum'. For a
+    simpler naive count of signs, use 'count'."""
+    if mask_dtype is None:
+        mask_dtype = delta.dtype
+    sign = delta.sign().to(mask_dtype)
+    if method == "sum":
+        sign_weight = delta.sum(dim=0)
+        majority_sign = (sign_weight >= 0).to(mask_dtype) * 2 - 1
+        del sign_weight
+    elif method == "count":
+        majority_sign = (sign.sum(dim=0) >= 0).to(mask_dtype) * 2 - 1
+    else:
+        raise RuntimeError(f'Unimplemented mask method "{method}"')
+    return sign == majority_sign
+def log_della_audit(
+    layer_name: str,
+    base_model: ModelReference,
+    tvs: List[Dict[str, Any]],
+    global_lambda: float,
+    method_name: str
+):
+    """Prints and saves a bar chart of DELLA/Task Arithmetic distribution based on actual Delta Norms."""
+    base_name = str(base_model.model.path).split("\\")[-1].split("/")[-1][:50]
+    bar_char = "█"
+    lines = [f"\n[{method_name} Audit] Layer: {layer_name} | Lambda={global_lambda:.2f}"]
+    lines.append(f"  [BASE] {base_name:<50}")
+    # 1. Calculate stats
+    stats = []
+    total_impact = 0.0
+    for tv in tvs:
+        model_name = str(tv['model'].model.path).split("\\")[-1].split("/")[-1][:50]
+        weight = tv.get('weight', 0.0)
+        density = tv.get('density', 1.0)
+        epsilon = tv.get('epsilon', None)
+        delta = tv.get('delta', None)
+        norm = 0.0
+        if delta is not None:
+            # Use float32 for norm calculation to be safe
+            norm = torch.norm(delta.float()).item()
+        # Effective contribution magnitude = Weight * Norm
+        # This shows how much this model is actually moving the weights
+        impact = weight * norm
+        total_impact += impact
+        stats.append({
+            'name': model_name,
+            'weight': weight,
+            'density': density,
+            'epsilon': epsilon,
+            'norm': norm,
+            'impact': impact
+        })
+    # Sort by name for consistent logs
+    stats.sort(key=lambda x: x['name'])
+    # 2. Generate bars
+    for s in stats:
+        # Calculate percentage relative to the sum of all impacts (Share of Voice)
+        pct = (s['impact'] / total_impact * 100) if total_impact > 0 else 0.0
+        # Bar length (max 50 chars for 100%)
+        bar_len = int(max(0, min(50, pct / 2)))
+        bar = bar_char * bar_len
+        # Format info string
+        # W=Weight, D=Density, N=DeltaNorm
+        info = f"W:{s['weight']:.2f} D:{s['density']:.2f} N:{s['norm']:.2f}"
+        if s['epsilon'] is not None:
+            info += f" E:{s['epsilon']:.2f}"
+        lines.append(f"  {s['name']:<50}: {bar:<50} {pct:5.1f}% ({info})")
+    log_entry = "\n".join(lines)
+    print(log_entry)
+    with open("della_audit.log", "a", encoding="utf-8") as f:
+        f.write(log_entry + "\n")

Audits/model_stock.py ADDED Viewed

	@@ -0,0 +1,183 @@

+# Copyright (C) 2025 Arcee AI
+# SPDX-License-Identifier: LGPL-3.0-only
+# model_stock + live audit report by Naphula
+import logging
+import os
+from typing import Any, Dict, List, Optional
+import torch
+from typing_extensions import override
+from mergekit.architecture import WeightInfo
+from mergekit.common import ImmutableMap, ModelReference
+from mergekit.graph import Task
+from mergekit.merge_methods.base import (
+    ConfigParameterDef,
+    MergeMethod,
+    MergeTensorInput,
+)
+from mergekit.merge_methods.rectify_embed import rectify_embed_sizes
+class ModelStockMergeTask(Task[torch.Tensor]):
+    gather_tensors: MergeTensorInput
+    base_model: ModelReference
+    weight_info: WeightInfo
+    filter_wise: bool = False
+    def uses_accelerator(self) -> bool:
+        return True
+    def arguments(self) -> Dict[str, Task]:
+        return {"tensors": self.gather_tensors}
+    def execute(self, tensors: Dict[ModelReference, torch.Tensor]) -> torch.Tensor:
+        if len(tensors) == 1 and self.base_model in tensors:
+            return tensors[self.base_model]
+        if len(tensors) < 3:
+            if self.weight_info.optional:
+                logging.warning(
+                    f"Optional weight {self.weight_info.name} not present in enough models, discarding"
+                )
+                return None
+            raise ValueError(
+                "ModelStockMerge requires at least 3 models (base plus two+ others)"
+            )
+        w_0, ws = self.get_rectified_weights(tensors)
+        out_shape = w_0.shape
+        if self.filter_wise:
+            if w_0.dim() == 1:
+                # bias (or other single-vector) parameters should be treated as row vectors
+                w_0 = w_0.unsqueeze(0)
+                ws = [w.unsqueeze(0) for w in ws]
+        else:
+            w_0 = w_0.view(-1)
+            ws = [w.view(-1) for w in ws]
+        offsets = [w - w_0 for w in ws]
+        # now there is a question of how to come up with a value for theta.
+        # in the two-vector case, we can get an exact angle between the two vectors
+        # but the paper doesn't explicitly say what to do in the multi-vector case -
+        # they keep using a singular theta value and don't elaborate on how to
+        # calculate it. i'm going to assume an average of pairwise angles for now? i guess?
+        cos_thetas = []
+        for i, w_0_offset in enumerate(offsets):
+            for j in range(i + 1, len(offsets)):
+                w_1_offset = offsets[j]
+                norm_product = torch.norm(w_0_offset, dim=-1) * torch.norm(
+                    w_1_offset, dim=-1
+                )
+                cos_theta = (
+                    (w_0_offset * w_1_offset).sum(dim=-1) / norm_product.clamp(min=1e-6)
+                ).clamp(-1, 1)
+                cos_thetas.append(cos_theta)
+        cos_theta = torch.stack(cos_thetas).mean(dim=0).unsqueeze(-1)
+        N = len(ws)
+        t = (N * cos_theta) / (1 + (N - 1) * cos_theta)
+        # --- LIVE AUDIT CHART ---
+        t_scalar = t.mean().item()
+        base_name = str(self.base_model.model.path)
+        donor_names = [str(k.model.path) for k in tensors.keys() if k != self.base_model]
+        donor_names.sort() # Deterministic order
+        log_model_stock_audit(self.weight_info.name, t_scalar, base_name, donor_names)
+        # ------------------------
+        w_avg = sum(ws) / len(ws)
+        w_h = t * w_avg + (1 - t) * w_0
+        return w_h.reshape(out_shape)
+    def get_rectified_weights(self, tensors: Dict[ModelReference, torch.Tensor]):
+        if self.base_model not in tensors:
+            raise ValueError("Base model tensor not found")
+        all_weights = [tensors[self.base_model]] + [
+            tensors[k] for k in tensors if k != self.base_model
+        ]
+        rectify_embed_sizes(self.weight_info, all_weights)
+        w_0 = all_weights[0]
+        ws = all_weights[1:]
+        return w_0, ws
+    def group_label(self) -> Optional[str]:
+        return self.gather_tensors.group_label()
+class ModelStockMerge(MergeMethod):
+    def name(self) -> str:
+        return "model_stock"
+    @override
+    def pretty_name(self) -> Optional[str]:
+        return "Model Stock"
+    @override
+    def reference_url(self):
+        return "https://arxiv.org/abs/2403.19522"
+    def parameters(self) -> List[ConfigParameterDef]:
+        return [
+            ConfigParameterDef(name="filter_wise", required=False, default_value=False)
+        ]
+    def make_task(
+        self,
+        *,
+        output_weight: WeightInfo,
+        tensors: MergeTensorInput,
+        base_model: Optional[ModelReference],
+        parameters: ImmutableMap[str, Any],
+        **_kwargs,
+    ) -> Task:
+        return ModelStockMergeTask(
+            gather_tensors=tensors,
+            base_model=base_model,
+            weight_info=output_weight,
+            filter_wise=parameters["filter_wise"],
+        )
+def log_model_stock_audit(layer_name: str, t_value: float, base_name: str, donor_names: List[str]):
+    """Prints and saves a bar chart of Model Stock interpolation."""
+    # t is the weight of the average of donors.
+    # (1-t) is the weight of the base.
+    # Each donor gets t / len(donors).
+    n_donors = len(donor_names)
+    base_weight = 1.0 - t_value
+    donor_weight = t_value / n_donors if n_donors > 0 else 0.0
+    bar_char = "█"
+    lines = [f"\n[Model Stock Audit] Layer: {layer_name} | t={t_value:.4f}"]
+    # Base
+    pct = base_weight * 100
+    # Clamp bar length for visualization safety
+    bar_len = int(max(0, min(100, pct)) / 2)
+    bar = bar_char * bar_len
+    clean_base = base_name.split("\\")[-1].split("/")[-1][:60]
+    lines.append(f"  {clean_base:<60}: {bar:<50} ({pct:6.2f}%)")
+    # Donors
+    for name in donor_names:
+        pct = donor_weight * 100
+        bar_len = int(max(0, min(100, pct)) / 2)
+        bar = bar_char * bar_len
+        clean_name = name.split("\\")[-1].split("/")[-1][:60]
+        lines.append(f"  {clean_name:<60}: {bar:<50} ({pct:6.2f}%)")
+    log_entry = "\n".join(lines)
+    print(log_entry)
+    with open("model_stock_audit.log", "a", encoding="utf-8") as f:
+        f.write(log_entry + "\n")

model_tools.md CHANGED Viewed

@@ -17,6 +17,18 @@ Tools to enhance LLM quantizations and merging
 # config.py
 - Simply replace line 13 | BEFORE `ScalarOrGradient: TypeAlias = Union[float, List[float]]` → AFTER `ScalarOrGradient: TypeAlias = Union[float, List[float], str, bool]` | to allow for custom filepath strings within parameter settings.
 # [metadata_audit.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/metadata_audit.py)
 - Checks multiple models within subdirectories for vocab or rope mismatch (useful for large merges). Calibrated for Mistral Nemo 12B by default.

 # config.py
 - Simply replace line 13 | BEFORE `ScalarOrGradient: TypeAlias = Union[float, List[float]]` → AFTER `ScalarOrGradient: TypeAlias = Union[float, List[float], str, bool]` | to allow for custom filepath strings within parameter settings.
+# [audit_della.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/audit_della.py)
+- Audit the compatibility of donor models for `Della` merges before merging. See: [example chart Asmodeus](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Asmodeus_Audit.png), [example log Asmodeus](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Asmodeus_Audit.log), [example chart Slimaki](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Slimaki_Audit.png), [example log Slimaki](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Slimaki_Audit.log)
+# [audit_karcher.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/audit_karcher.py)
+- Audit the compatibility of donor models for `Karcher` merges before merging. See: [example chart Goetia](https://cdn-uploads.huggingface.co/production/uploads/68e840caa318194c44ec2a04/nSuSM6v_BQBP4tAWK9rGQ.png)
+# [generalized_task_arithmetic.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/generalized_task_arithmetic.py)
+- Live audit reports of **actual contribution magnitude** on a per-layer basis for `Della` merges. See: [example audit Asmodeus](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Asmodeus_Live_Audit.png), [example audit Slimaki](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/Slimaki_Live_Audit.png)
+# [model_stock.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/Audits/model_stock.py)
+- Live audit reports of **actual contribution magnitude** on a per-layer basis for `Model_Stock` merges.
 # [metadata_audit.py](https://huggingface.co/spaces/Naphula/model_tools/blob/main/metadata_audit.py)
 - Checks multiple models within subdirectories for vocab or rope mismatch (useful for large merges). Calibrated for Mistral Nemo 12B by default.