cpatonn commited on
Commit
da932ef
·
verified ·
1 Parent(s): 394494a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -12
README.md CHANGED
@@ -6,26 +6,59 @@ pipeline_tag: text-generation
6
  base_model:
7
  - Qwen/Qwen3-Coder-30B-A3B-Instruct
8
  ---
9
- # Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit
10
 
11
- ## Method
12
- [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit/blob/main/recipe.yaml).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ## Inference
15
- Please install the latest vllm releases for better support:
16
- ```
 
 
17
  pip install -U vllm
18
  ```
19
 
20
- Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit example usage:
21
- ```
22
- vllm serve cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit \
23
- --dtype float16 \
24
- --tensor-parallel-size 4 \
25
- --enable-auto-tool-choice \
26
- --tool-call-parser hermes
27
  ```
28
 
 
 
 
 
 
 
 
 
 
 
 
29
  # Qwen3-Coder-30B-A3B-Instruct
30
  <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
31
  <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
 
6
  base_model:
7
  - Qwen/Qwen3-Coder-30B-A3B-Instruct
8
  ---
9
+ # Qwen3-Coder-30B-A3B-Instruct AWQ - INT4
10
 
11
+ ## Model Details
12
+
13
+ ### Quantization Details
14
+
15
+ - **Quantization Method:** cyankiwi AWQ v1.0
16
+ - **Bits:** 4
17
+ - **Group Size:** 32
18
+ - **Calibration Dataset:** [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset)
19
+ - **Quantization Tool:** [llm-compressor](https://github.com/vllm-project/llm-compressor)
20
+
21
+ ### Memory Usage
22
+
23
+ | **Type** | **Qwen3-Coder-30B-A3B-Instruct** | **Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit** |
24
+ |:---------------:|:----------------:|:----------------:|
25
+ | **Memory Size** | 56.9 GB | 16.9 GB |
26
+ | **KV Cache per Token** | 48.0 kB | 12.0 kB |
27
+ | **KV Cache per Context** | 12.0 GB | 3.0 GB |
28
+
29
+ ### Evaluations
30
+
31
+ | **Benchmarks** | **Qwen3-Coder-30B-A3B-Instruct** | **Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit** |
32
+ |:---------------:|:----------------:|:----------------:|
33
+ | **Perplexity** | 1.61607 | 1.62824 |
34
+
35
+ - **Evaluation Context Length:** 16384
36
 
37
  ## Inference
38
+
39
+ ### Prerequisite
40
+
41
+ ```bash
42
  pip install -U vllm
43
  ```
44
 
45
+ ### Basic Usage
46
+
47
+ ```bash
48
+ vllm serve cyankiwi/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit --max-model-len 262144
 
 
 
49
  ```
50
 
51
+ ## Additional Information
52
+
53
+ ### Changelog
54
+
55
+ - **v1.0.0** - cyankiwi AWQ v1.0 release
56
+
57
+ ### Authors
58
+
59
+ - **Name:** Ton Cao
60
+ - **Contacts:** [email protected]
61
+
62
  # Qwen3-Coder-30B-A3B-Instruct
63
  <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
64
  <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>