Please add IQ3_KT and IQ4_KT

by KeinNiemand - opened Dec 3, 2025

Discussion

KeinNiemand

Dec 3, 2025

IQ_KT quants are even more efficient then regular IQ_K quants having more of them available would be nice.

ubergarm

Owner Dec 3, 2025

@KeinNiemand

In my own perplexity measurements the IQ4_KSS is very competitive with the IQ4_KT and both are 4bpw but the KSS is faster for CPU token generation (decode) speeds.

I agree the KT quants are quite strong but the trade-off is needing enough VRAM for full GPU offload even with the routed experts.

If you have enough VRAM I'd suggest you check out some of the EXL3 versions of GLM-4.5-Air to run with https://github.com/turboderp-org/exllamav3

https://huggingface.co/turboderp/GLM-4.5-Air-exl3 has a good selection of BPW in the range you seem interested.

Cheers!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment