Quant Size Description
Q2_K_XXS 209.49 MB Not recommended for most people. Extremelly low quality.
Q2_K_XS 209.49 MB Not recommended for most people. Very low quality.
Q2_K 209.49 MB Not recommended for most people. Very low quality.
Q2_K_L 318.45 MB Not recommended for most people. Uses Q8_0 for output and embedding, and Q2_K for everything else. Very low quality.
Q2_K_XL 457.55 MB Not recommended for most people. Uses F16 for output and embedding, and Q2_K for everything else. Very low quality.
Q3_K_XXS 250.15 MB Not recommended for most people. Prefer any bigger Q3_K quantization. Very low quality.
Q3_K_XS 250.15 MB Not recommended for most people. Prefer any bigger Q3_K quantization. Very low quality.
Q3_K_S 250.15 MB Not recommended for most people. Prefer any bigger Q3_K quantization. Low quality.
Q3_K_M 273.09 MB Not recommended for most people. Low quality.
Q3_K_L 293.46 MB Not recommended for most people. Low quality.
Q3_K_XL 387.36 MB Not recommended for most people. Uses Q8_0 for output and embedding, and Q3_K_L for everything else. Low quality.
Q3_K_XXL 526.46 MB Not recommended for most people. Uses F16 for output and embedding, and Q3_K_L for everything else. Low quality.
Q4_K_XS 327.26 MB Lower quality than Q4_K_S.
Q4_K_S 327.26 MB Recommended. Slightly low quality.
Q4_K_M 340.07 MB Recommended. Decent quality for most use cases.
Q4_K_L 414.26 MB Recommended. Uses Q8_0 for output and embedding, and Q4_K_M for everything else. Decent quality.
Q4_K_XL 553.36 MB Recommended. Uses F16 for output and embedding, and Q4_K_M for everything else. Decent quality.
Q5_K_XXS 396.68 MB Lower quality than Q5_K_S.
Q5_K_XS 396.68 MB Lower quality than Q5_K_S.
Q5_K_S 396.68 MB Recommended. High quality.
Q5_K_M 404.12 MB Recommended. High quality.
Q5_K_L 459.76 MB Recommended. Uses Q8_0 for output and embedding, and Q5_K_M for everything else. High quality.
Q5_K_XL 598.86 MB Recommended. Uses F16 for output and embedding, and Q5_K_M for everything else. High quality.
Q6_K_S 472.17 MB Lower quality than Q6_K.
Q6_K 472.17 MB Recommended. Very high quality.
Q6_K_L 508.11 MB Recommended. Uses Q8_0 for output and embedding, and Q6_K for everything else. Very high quality.
Q6_K_XL 647.21 MB Recommended. Uses F16 for output and embedding, and Q6_K for everything else. Very high quality.
Q8_K_XS 609.82 MB Lower quality than Q8_0.
Q8_K_S 609.82 MB Lower quality than Q8_0.
Q8_0 609.82 MB Recommended. Quality almost like F16.
Q8_K_XL 748.93 MB Recommended. Uses F16 for output and embedding, and Q8_0 for everything else. Quality almost like F16.
F16 1.12 GB Not recommended. Overkill. Prefer Q8_0.
ORIGINAL (BF16) 1.12 GB Not recommended. Overkill. Prefer Q8_0.

Quantized using TAO71-AI AutoQuantizer. You can check out the original model card here.

Downloads last month
39
GGUF
Model size
0.6B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alcoft/dnotitia_Smoothie-Qwen3-0.6B-GGUF

Finetuned
Qwen/Qwen3-0.6B
Quantized
(5)
this model

Collections including Alcoft/dnotitia_Smoothie-Qwen3-0.6B-GGUF