mratsim
/

GLM-4.7-EXL3

Text Generation

exllamav3

exl3

Model card Files Files and versions

xet

Community

mratsim commited on Dec 25, 2025

Commit

1e2c9a6

verified ·

1 Parent(s): 4157a07

Add 2bpw and metrics

Browse files

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -47,13 +47,13 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
 | Quant                                                            | Size    | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1  | Top-2  | Top-3  | Top-4  | Top-5  |
 | ---------------------------------------------------------------- | ------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
-| 2bpw (pending) | WIP | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
-| [3bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3bpw_H6) | 124 GiB | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
-| [4bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/4bpw_H6) | 165 GiB | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
 | 5bpw (pending) | 206 GiB | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
-| [6bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/6bpw_H8) | 247 GiB | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
-| [8bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/8bpw_H8) | 328 GiB | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
-| FP16                                                             | 656 GiB |                      |                      | WIP |        |        |        |        |        |
 ### Optimized Quants
@@ -64,7 +64,7 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
 | Quant                                                                            | Size       | Context / VRAM                            | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1  | Top-2  | Top-3  | Top-4  | Top-5  |
 | -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
-| [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158GiB GiB | 202752 tokens (max), k6v5 for 192GiB VRAM | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
 | 4.16bpw-tuned🂱 (WIP)| 171GiB GiB | 107520 tokens, k5v4 for 192GiB VRAM       | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
 - "opt🂡" for automatically optimized quants

 | Quant                                                            | Size    | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1  | Top-2  | Top-3  | Top-4  | Top-5  |
 | ---------------------------------------------------------------- | ------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
+| [2bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2bpw_H6) | 83 GiB | 0.65096196           | 0.75914080           | 9.36106675 | 0.7315 | 0.3852 | 0.1653 | 0.0628 | 0.0221 |
+| [3bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3bpw_H6) | 124 GiB | 0.27578034           | 0.28499938           | 6.95262863 | 0.8388 | 0.5717 | 0.3306 | 0.1713 | 0.0805 |
+| [4bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/4bpw_H6) | 165 GiB | 0.13722391           | 0.13577676           | 6.60474035 | 0.8947 | 0.6948 | 0.4810 | 0.3007 | 0.1754 |
 | 5bpw (pending) | 206 GiB | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
+| [6bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/6bpw_H8) | 247 GiB | 0.08202591           | 0.0784423           | 6.32611481 | 0.9334 | 0.7951 | 0.6274 | 0.4597 | 0.3190 |
+| [8bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/8bpw_H8) | 328 GiB | 0.07552261           | 0.07230427           | 6.38240525 | 0.9396 | 0.8172 | 0.6598 | 0.5048 | 0.3666 |
+| FP16                                                             | 656 GiB |                      |                      | 6.49784813 |        |        |        |        |        |
 ### Optimized Quants
 | Quant                                                                            | Size       | Context / VRAM                            | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1  | Top-2  | Top-3  | Top-4  | Top-5  |
 | -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
+| [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158GiB GiB | 202752 tokens (max), k6v5 for 192GiB VRAM | 0.15823333           | 0.15401253           | 6.41935951 | 0.8854 | 0.6743 | 0.4587 | 0.2832 | 0.1638 |
 | 4.16bpw-tuned🂱 (WIP)| 171GiB GiB | 107520 tokens, k5v4 for 192GiB VRAM       | WIP           | WIP           | WIP | WIP | WIP | WIP | WIP | WIP |
 - "opt🂡" for automatically optimized quants