Text-to-Speech
Safetensors
GGUF
qwen2
audio
speech
speech-language-models
conversational

Issues running gguf on GPU

#17
by lucaelin - opened

I had great success running the torch version on both cpu and gpu, but the gguf seems to only work on cpu for me. Whenever I try running llama.cpp with cuda or vulkan on gpu, the model usually generates no tokens and sometimes only a few, causing either an exception in the decoding or just a very short noise. This is with both streaming and non-steaming mode, on both q4 and q8, I also exported a f16 gguf, with also shows the exact same error.
Did anyone get this setup to work?

following, same problem

Sign up or log in to comment