mmproj precision
#1
by
BigWhoop - opened
Hi, I saw that the Qwen3-VL GGUFs often come with a F32 mmproj. Is there any benefit in using the F32 over F16?
Not really - F16 / BF16 is enough - F32 might work as well, but tbh BF16 is enough
The vision tensors are almost always in BF16 so going to F32 doesn't add anything. BF16 -> F16 is a lossy conversion but it's debatable whether you'd notice the difference.
If your hardware doesn't support BF16 however then F32 may be used if you want to use a lossless mmproj.