Broken?

#1
by Notsuped - opened

Getting bad replies compared to other gemma 3 GGUFs. File isn't damaged SHA256 sum is accurate.

gemma3-27b-abliterated-dpo.i1-IQ3_XXS
image
gemma-3-27B-abliterated-normpreserve-Q3_K_M
image
gemma-3-27b-it-UD-IQ3_XXS
image

Thanks for your feedback. I am working to improve the quality of model, will come back with an update.

A new set of GGUF files has been uploaded. The quality of the model’s responses has improved significantly.
Please give it a try!

@YanLabs
Personally, I haven't tested it much (only in a few RP chats) but it seems to be doing much better, compared to the previous version. It's not overly compliant, with enough soft refusals (it is a welcome thing) and it's able to deliver coherent and smart responses (at Q4K_M at least). I've also attempted 'needle-in-a-haystack' test, feeding the model ~20k tokens of short story with a few vulgar words hidden inside. It has found them all quite successfully.

@grimjim
Thank you so much for your work too!

On a side note, I'd like to suggest a model. Users were deeply frustrated with its 'safety policies':
https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker
It would be interesting to see just how differently will it perform when treated similarly, if it's doable at all.

@YanLabs
Just a quick question: if I want to try something like IQ4_XS, will the F16 file from here be sufficient for applying quantization or should I get safetensors and convert it myself? Or is there any chance you'll add the lacking size options?

F16 file is sufficient.

Use llama.cpp:

quantize the model to 4-bits (using Q4_K_M method)

./llama-quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M

Please refer to: https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md

Sign up or log in to comment