Broken?
Thanks for your feedback. I am working to improve the quality of model, will come back with an update.
A new set of GGUF files has been uploaded. The quality of the model’s responses has improved significantly.
Please give it a try!
@YanLabs
Personally, I haven't tested it much (only in a few RP chats) but it seems to be doing much better, compared to the previous version. It's not overly compliant, with enough soft refusals (it is a welcome thing) and it's able to deliver coherent and smart responses (at Q4K_M at least). I've also attempted 'needle-in-a-haystack' test, feeding the model ~20k tokens of short story with a few vulgar words hidden inside. It has found them all quite successfully.
@grimjim
Thank you so much for your work too!
On a side note, I'd like to suggest a model. Users were deeply frustrated with its 'safety policies':
https://huggingface.co/ServiceNow-AI/Apriel-1.5-15b-Thinker
It would be interesting to see just how differently will it perform when treated similarly, if it's doable at all.
F16 file is sufficient.
Use llama.cpp:
quantize the model to 4-bits (using Q4_K_M method)
./llama-quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
Please refer to: https://github.com/ggml-org/llama.cpp/blob/master/tools/quantize/README.md
Thanks.


