cpatonn
/

Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit

Text Generation

compressed-tensors

Model card Files Files and versions

cpatonn commited on 12 days ago

Commit

da932ef

·

verified ·

1 Parent(s): 394494a

Update README.md

Files changed (1) hide show

README.md +45 -12

README.md CHANGED Viewed

@@ -6,26 +6,59 @@ pipeline_tag: text-generation
 base_model:
 - Qwen/Qwen3-Coder-30B-A3B-Instruct
 ---
-# Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit
-## Method
-[vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit/blob/main/recipe.yaml).
 ## Inference
-Please install the latest vllm releases for better support:
-```
 pip install -U vllm
 ```
-Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit example usage:
-```
-vllm serve cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit \
-    --dtype float16 \
-    --tensor-parallel-size 4  \
-    --enable-auto-tool-choice \
-    --tool-call-parser hermes
 ```
 # Qwen3-Coder-30B-A3B-Instruct
 <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
     <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>

 base_model:
 - Qwen/Qwen3-Coder-30B-A3B-Instruct
 ---
+# Qwen3-Coder-30B-A3B-Instruct AWQ - INT4
+## Model Details
+### Quantization Details
+- **Quantization Method:** cyankiwi AWQ v1.0
+- **Bits:** 4
+- **Group Size:** 32
+- **Calibration Dataset:** [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset)
+- **Quantization Tool:** [llm-compressor](https://github.com/vllm-project/llm-compressor)
+### Memory Usage
+| **Type** | **Qwen3-Coder-30B-A3B-Instruct** | **Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit** |
+|:---------------:|:----------------:|:----------------:|
+| **Memory Size** | 56.9 GB | 16.9 GB |
+| **KV Cache per Token** | 48.0 kB | 12.0 kB |
+| **KV Cache per Context** | 12.0 GB | 3.0 GB |
+### Evaluations
+| **Benchmarks** | **Qwen3-Coder-30B-A3B-Instruct** | **Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit** |
+|:---------------:|:----------------:|:----------------:|
+| **Perplexity** | 1.61607 | 1.62824 |
+- **Evaluation Context Length:** 16384
 ## Inference
+### Prerequisite
+```bash
 pip install -U vllm
 ```
+### Basic Usage
+```bash
+vllm serve cyankiwi/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit --max-model-len 262144
 ```
+## Additional Information
+### Changelog
+- **v1.0.0** - cyankiwi AWQ v1.0 release
+### Authors
+- **Name:** Ton Cao
+- **Contacts:** [email protected]
 # Qwen3-Coder-30B-A3B-Instruct
 <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
     <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>