Please add IQ3_KT and IQ4_KT

#8
by KeinNiemand - opened

IQ_KT quants are even more efficient then regular IQ_K quants having more of them available would be nice.

@KeinNiemand

In my own perplexity measurements the IQ4_KSS is very competitive with the IQ4_KT and both are 4bpw but the KSS is faster for CPU token generation (decode) speeds.

I agree the KT quants are quite strong but the trade-off is needing enough VRAM for full GPU offload even with the routed experts.

If you have enough VRAM I'd suggest you check out some of the EXL3 versions of GLM-4.5-Air to run with https://github.com/turboderp-org/exllamav3

https://huggingface.co/turboderp/GLM-4.5-Air-exl3 has a good selection of BPW in the range you seem interested.

Cheers!

Sign up or log in to comment