cpatonn commited on
Commit
e72982a
·
verified ·
1 Parent(s): 6bac4dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -6,7 +6,26 @@ pipeline_tag: text-generation
6
  base_model:
7
  - Qwen/Qwen3-Coder-30B-A3B-Instruct
8
  ---
 
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  # Qwen3-Coder-30B-A3B-Instruct
11
  <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
12
  <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
 
6
  base_model:
7
  - Qwen/Qwen3-Coder-30B-A3B-Instruct
8
  ---
9
+ # Qwen3-Coder-30B-A3B-Instruct-AWQ
10
 
11
+ ## Method
12
+ Quantised using [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git), [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) and the following configs:
13
+ ```
14
+ recipe = [
15
+ AWQModifier(
16
+ ignore=["lm_head", "re:.*mlp.gate$", "re:.*mlp.shared_expert_gate$"],
17
+ scheme="W4A16",
18
+ targets=["Linear"],
19
+ ),
20
+ ]
21
+ ```
22
+ ## Inference
23
+
24
+ ### vllm
25
+ Please load the model into vllm and sglang as float16 data type for AWQ support and use `tensor_parallel_size <= 2` i.e.,
26
+ ```
27
+ vllm serve cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ --dtype float16 --tensor-parallel-size 2 --pipeline-parallel-size 2
28
+ ```
29
  # Qwen3-Coder-30B-A3B-Instruct
30
  <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
31
  <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>