cpatonn commited on
Commit
394494a
·
verified ·
1 Parent(s): 09c5b72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -6,26 +6,26 @@ pipeline_tag: text-generation
6
  base_model:
7
  - Qwen/Qwen3-Coder-30B-A3B-Instruct
8
  ---
9
- # Qwen3-Coder-30B-A3B-Instruct-AWQ
10
 
11
  ## Method
12
- Quantised using [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git), [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) and the following configs:
 
 
 
13
  ```
14
- recipe = [
15
- AWQModifier(
16
- ignore=["lm_head", "re:.*mlp.gate$", "re:.*mlp.shared_expert_gate$"],
17
- scheme="W4A16",
18
- targets=["Linear"],
19
- ),
20
- ]
21
  ```
22
- ## Inference
23
 
24
- ### vllm
25
- Please load the model into vllm and sglang as float16 data type for AWQ support and use `tensor_parallel_size <= 2` i.e.,
26
  ```
27
- vllm serve cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit --dtype float16 --tensor-parallel-size 2 --pipeline-parallel-size 2
 
 
 
 
28
  ```
 
29
  # Qwen3-Coder-30B-A3B-Instruct
30
  <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
31
  <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
 
6
  base_model:
7
  - Qwen/Qwen3-Coder-30B-A3B-Instruct
8
  ---
9
+ # Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit
10
 
11
  ## Method
12
+ [vllm-project/llm-compressor](https://github.com/vllm-project/llm-compressor.git) and [nvidia/Llama-Nemotron-Post-Training-Dataset](https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset) were used to quantize the original model. For further quantization arguments and configurations information, please visit [config.json](https://huggingface.co/cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit/blob/main/config.json) and [recipe.yaml](https://huggingface.co/cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit/blob/main/recipe.yaml).
13
+
14
+ ## Inference
15
+ Please install the latest vllm releases for better support:
16
  ```
17
+ pip install -U vllm
 
 
 
 
 
 
18
  ```
 
19
 
20
+ Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit example usage:
 
21
  ```
22
+ vllm serve cpatonn/Qwen3-Coder-30B-A3B-Instruct-AWQ-4bit \
23
+ --dtype float16 \
24
+ --tensor-parallel-size 4 \
25
+ --enable-auto-tool-choice \
26
+ --tool-call-parser hermes
27
  ```
28
+
29
  # Qwen3-Coder-30B-A3B-Instruct
30
  <a href="https://chat.qwen.ai/" target="_blank" style="margin: 2px;">
31
  <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>