TAO71-AI Quants: Qwen3
Collection
16 items • Updated
| Quant | Size | Description |
|---|---|---|
| Q2_K_XXS | 209.49 MB | Not recommended for most people. Extremelly low quality. |
| Q2_K_XS | 209.49 MB | Not recommended for most people. Very low quality. |
| Q2_K | 209.49 MB | Not recommended for most people. Very low quality. |
| Q2_K_L | 318.45 MB | Not recommended for most people. Uses Q8_0 for output and embedding, and Q2_K for everything else. Very low quality. |
| Q2_K_XL | 457.55 MB | Not recommended for most people. Uses F16 for output and embedding, and Q2_K for everything else. Very low quality. |
| Q3_K_XXS | 250.15 MB | Not recommended for most people. Prefer any bigger Q3_K quantization. Very low quality. |
| Q3_K_XS | 250.15 MB | Not recommended for most people. Prefer any bigger Q3_K quantization. Very low quality. |
| Q3_K_S | 250.15 MB | Not recommended for most people. Prefer any bigger Q3_K quantization. Low quality. |
| Q3_K_M | 273.09 MB | Not recommended for most people. Low quality. |
| Q3_K_L | 293.46 MB | Not recommended for most people. Low quality. |
| Q3_K_XL | 387.36 MB | Not recommended for most people. Uses Q8_0 for output and embedding, and Q3_K_L for everything else. Low quality. |
| Q3_K_XXL | 526.46 MB | Not recommended for most people. Uses F16 for output and embedding, and Q3_K_L for everything else. Low quality. |
| Q4_K_XS | 327.26 MB | Lower quality than Q4_K_S. |
| Q4_K_S | 327.26 MB | Recommended. Slightly low quality. |
| Q4_K_M | 340.07 MB | Recommended. Decent quality for most use cases. |
| Q4_K_L | 414.26 MB | Recommended. Uses Q8_0 for output and embedding, and Q4_K_M for everything else. Decent quality. |
| Q4_K_XL | 553.36 MB | Recommended. Uses F16 for output and embedding, and Q4_K_M for everything else. Decent quality. |
| Q5_K_XXS | 396.68 MB | Lower quality than Q5_K_S. |
| Q5_K_XS | 396.68 MB | Lower quality than Q5_K_S. |
| Q5_K_S | 396.68 MB | Recommended. High quality. |
| Q5_K_M | 404.12 MB | Recommended. High quality. |
| Q5_K_L | 459.76 MB | Recommended. Uses Q8_0 for output and embedding, and Q5_K_M for everything else. High quality. |
| Q5_K_XL | 598.86 MB | Recommended. Uses F16 for output and embedding, and Q5_K_M for everything else. High quality. |
| Q6_K_S | 472.17 MB | Lower quality than Q6_K. |
| Q6_K | 472.17 MB | Recommended. Very high quality. |
| Q6_K_L | 508.11 MB | Recommended. Uses Q8_0 for output and embedding, and Q6_K for everything else. Very high quality. |
| Q6_K_XL | 647.21 MB | Recommended. Uses F16 for output and embedding, and Q6_K for everything else. Very high quality. |
| Q8_K_XS | 609.82 MB | Lower quality than Q8_0. |
| Q8_K_S | 609.82 MB | Lower quality than Q8_0. |
| Q8_0 | 609.82 MB | Recommended. Quality almost like F16. |
| Q8_K_XL | 748.93 MB | Recommended. Uses F16 for output and embedding, and Q8_0 for everything else. Quality almost like F16. |
| F16 | 1.12 GB | Not recommended. Overkill. Prefer Q8_0. |
| ORIGINAL (BF16) | 1.12 GB | Not recommended. Overkill. Prefer Q8_0. |
Quantized using TAO71-AI AutoQuantizer. You can check out the original model card here.