mratsim commited on
Commit
1e2c9a6
·
verified ·
1 Parent(s): 4157a07

Add 2bpw and metrics

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -47,13 +47,13 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
47
 
48
  | Quant | Size | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
49
  | ---------------------------------------------------------------- | ------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
50
- | 2bpw (pending) | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
51
- | [3bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3bpw_H6) | 124 GiB | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
52
- | [4bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/4bpw_H6) | 165 GiB | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
53
  | 5bpw (pending) | 206 GiB | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
54
- | [6bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/6bpw_H8) | 247 GiB | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
55
- | [8bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/8bpw_H8) | 328 GiB | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
56
- | FP16 | 656 GiB | | | WIP | | | | | |
57
 
58
  ### Optimized Quants
59
 
@@ -64,7 +64,7 @@ The base quants use the new "MCG" multiplier from https://github.com/turboderp-o
64
 
65
  | Quant | Size | Context / VRAM | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
66
  | -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
67
- | [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158GiB GiB | 202752 tokens (max), k6v5 for 192GiB VRAM | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
68
  | 4.16bpw-tuned🂱 (WIP)| 171GiB GiB | 107520 tokens, k5v4 for 192GiB VRAM | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
69
 
70
  - "opt🂡" for automatically optimized quants
 
47
 
48
  | Quant | Size | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
49
  | ---------------------------------------------------------------- | ------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
50
+ | [2bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/2bpw_H6) | 83 GiB | 0.65096196 | 0.75914080 | 9.36106675 | 0.7315 | 0.3852 | 0.1653 | 0.0628 | 0.0221 |
51
+ | [3bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3bpw_H6) | 124 GiB | 0.27578034 | 0.28499938 | 6.95262863 | 0.8388 | 0.5717 | 0.3306 | 0.1713 | 0.0805 |
52
+ | [4bpw-H6](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/4bpw_H6) | 165 GiB | 0.13722391 | 0.13577676 | 6.60474035 | 0.8947 | 0.6948 | 0.4810 | 0.3007 | 0.1754 |
53
  | 5bpw (pending) | 206 GiB | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
54
+ | [6bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/6bpw_H8) | 247 GiB | 0.08202591 | 0.0784423 | 6.32611481 | 0.9334 | 0.7951 | 0.6274 | 0.4597 | 0.3190 |
55
+ | [8bpw-H8](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/8bpw_H8) | 328 GiB | 0.07552261 | 0.07230427 | 6.38240525 | 0.9396 | 0.8172 | 0.6598 | 0.5048 | 0.3666 |
56
+ | FP16 | 656 GiB | | | 6.49784813 | | | | | |
57
 
58
  ### Optimized Quants
59
 
 
64
 
65
  | Quant | Size | Context / VRAM | KL-div (quant, FP16) | KL-div (FP16, quant) | Perplexity | Top-1 | Top-2 | Top-3 | Top-4 | Top-5 |
66
  | -------------------------------------------------------------------------------- | ---------- | ----------------------------------------- | -------------------- | -------------------- | ---------- | ------ | ------ | ------ | ------ | ------ |
67
+ | [3.84bpw-tuned🂱](https://huggingface.co/mratsim/GLM-4.7-EXL3/tree/3.84bpw-tuned)| 158GiB GiB | 202752 tokens (max), k6v5 for 192GiB VRAM | 0.15823333 | 0.15401253 | 6.41935951 | 0.8854 | 0.6743 | 0.4587 | 0.2832 | 0.1638 |
68
  | 4.16bpw-tuned🂱 (WIP)| 171GiB GiB | 107520 tokens, k5v4 for 192GiB VRAM | WIP | WIP | WIP | WIP | WIP | WIP | WIP | WIP |
69
 
70
  - "opt🂡" for automatically optimized quants