Q3_K_M (112 GB) is bigger than Q3_K_XL (104 GB)?

#8
by rtzurtz - opened

as per title

Unsloth AI org

Yes that;s correct. K_XL is usually smaller

But what are the implications?

Cos I have strix halo 128 GB, so can run Q3_K_XL at 100 GB or Q3_K_M at 115 GB, but what's the difference in perplexity or benchmarks?

Couldn't we have a UD K_XL which sits between them?

There's 20 GB+ of 'free real estate' on the device.

Sign up or log in to comment