vramlet kings
Collection focused on best in class tiny LLMs for those who can't afford large GPUs or run mobile.
21B • Updated • 98.5k • 113Note Can either fit a less vramlet config, or split CPU/GPU usage while keeping a decent speed. Fantastic model all around that stays coherent in many uses, at larger context while being at a size that remains reasonable for most users.
unsloth/Qwen3-4B-Instruct-2507-GGUF
4B • Updated • 65.2k • 105Note Long context king. Decent at many tasks.
unsloth/Qwen3-4B-Thinking-2507-GGUF
4B • Updated • 15k • 75Note Long context king++. Performs very well but the wait for the to end can feel long in the tooth.
unsloth/Qwen3-VL-4B-Instruct-GGUF
Image-Text-to-Text • 4B • Updated • 36.3k • 24Note Best VL model at this size.
unsloth/Qwen3-VL-4B-Thinking-GGUF
Image-Text-to-Text • 4B • Updated • 40.1k • 15Note Reasoning variant.
unsloth/Qwen3-1.7B-GGUF
Text Generation • 2B • Updated • 61.1k • 56Note Surprisingly coherent micro model.
ggml-org/Qwen2.5-Coder-1.5B-Q8_0-GGUF
Text Generation • 2B • Updated • 5k • 11Note Still the best VRAMlet fill-in-the-middle model for code completion.
Qwen/Qwen3-Embedding-0.6B-GGUF
0.6B • Updated • 22.7k • 470Note Embed supreme.
-
ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF
Text Ranking • 0.6B • Updated • 1.56k • 3
quickmt/quickmt-zh-en
Translation • Updated • 43 • 4Note Not LLM based, but this and the Firefox translation models are about as good as open NMT models get. Very high quality models for fast translation if you do not need the world knowledge / context ingestion / prompt instructions of LLMs.