Add 2bpw and metrics
Browse files
README.md
CHANGED
|
@@ -47,13 +47,13 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
|
|
| 47 |
|
| 48 |
| Quant | Size | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
|
| 49 |
| ---------------------------------------------------------------- | ------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
|
| 50 |
-
| 2bpw
|
| 51 |
-
| [3bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3bpw_H6) | 124 GiB |
|
| 52 |
-
| [4bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/4bpw_H6) | 165 GiB |
|
| 53 |
| 5bpw (pending) | 206 GiB | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
|
| 54 |
-
| [6bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/6bpw_H8) | 247 GiB |
|
| 55 |
-
| [8bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/8bpw_H8) | 328 GiB |
|
| 56 |
-
| FP16 | 656 GiB | | |
|
| 57 |
|
| 58 |
### Optimized Quants
|
| 59 |
|
|
@@ -64,7 +64,7 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
|
|
| 64 |
|
| 65 |
| Quant | Size | Context / VRAM | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
|
| 66 |
| -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
|
| 67 |
-
| [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158GiB GiB | 202752 tokens (max), k6v5 for 192GiB VRAM |
|
| 68 |
| 4.16bpw-tuned🂱 (WIP)| 171GiB GiB | 107520 tokens, k5v4 for 192GiB VRAM | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
|
| 69 |
|
| 70 |
- "opt🂡" for automatically optimized quants
|
|
|
|
| 47 |
|
| 48 |
| Quant | Size | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
|
| 49 |
| ---------------------------------------------------------------- | ------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
|
| 50 |
+
| [2bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2bpw_H6) | 83 GiB | 0.65096196 | 0.75914080 | 9.36106675 | 0.7315 | 0.3852 | 0.1653 | 0.0628 | 0.0221 |
|
| 51 |
+
| [3bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3bpw_H6) | 124 GiB | 0.27578034 | 0.28499938 | 6.95262863 | 0.8388 | 0.5717 | 0.3306 | 0.1713 | 0.0805 |
|
| 52 |
+
| [4bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/4bpw_H6) | 165 GiB | 0.13722391 | 0.13577676 | 6.60474035 | 0.8947 | 0.6948 | 0.4810 | 0.3007 | 0.1754 |
|
| 53 |
| 5bpw (pending) | 206 GiB | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
|
| 54 |
+
| [6bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/6bpw_H8) | 247 GiB | 0.08202591 | 0.0784423 | 6.32611481 | 0.9334 | 0.7951 | 0.6274 | 0.4597 | 0.3190 |
|
| 55 |
+
| [8bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/8bpw_H8) | 328 GiB | 0.07552261 | 0.07230427 | 6.38240525 | 0.9396 | 0.8172 | 0.6598 | 0.5048 | 0.3666 |
|
| 56 |
+
| FP16 | 656 GiB | | | 6.49784813 | | | | | |
|
| 57 |
|
| 58 |
### Optimized Quants
|
| 59 |
|
|
|
|
| 64 |
|
| 65 |
| Quant | Size | Context / VRAM | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
|
| 66 |
| -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
|
| 67 |
+
| [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158GiB GiB | 202752 tokens (max), k6v5 for 192GiB VRAM | 0.15823333 | 0.15401253 | 6.41935951 | 0.8854 | 0.6743 | 0.4587 | 0.2832 | 0.1638 |
|
| 68 |
| 4.16bpw-tuned🂱 (WIP)| 171GiB GiB | 107520 tokens, k5v4 for 192GiB VRAM | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
|
| 69 |
|
| 70 |
- "opt🂡" for automatically optimized quants
|