chiaraboretti commited on
Commit
c90294c
·
verified ·
1 Parent(s): d7cea6d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +159 -0
README.md ADDED
@@ -0,0 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - quantization
7
+ - sinq
8
+ - int3
9
+ - efficient-inference
10
+ - text-generation
11
+ - qwen
12
+ - llm
13
+ - compression
14
+ base_model:
15
+ - Qwen/Qwen3-Next-80B-A3B-Instruct
16
+ base_model_relation: quantized
17
+ ---
18
+
19
+ <p align="center">
20
+ <img src="logo.png" alt="Logo" style="max-width: 80%; height: auto;">
21
+ </p>
22
+
23
+ <p align="center">🐙 <a href="https://github.com/huawei-csl/SINQ">Github</a>&nbsp;&nbsp; | &nbsp;&nbsp;📄 <a href="http://arxiv.org/abs/2509.22944">Paper</a></p>
24
+
25
+
26
+ # SINQ 3-bit Quantized Qwen3-Next 80B model
27
+
28
+ This repository contains the official **3-bit quantized** version of the [`Qwen3-Next-80B`](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct) model using the **SINQ (Sinkhorn-Normalized Quantization)** method.
29
+ SINQ is a novel, fast and high-quality quantization method designed to make any Large Language Models smaller while keeping their accuracy almost intact.
30
+
31
+ To support the project please put a star ⭐ in the official [SINQ](https://github.com/huawei-csl/SINQ) github repository.
32
+
33
+ ## Model Details
34
+ - **Model Name:** `Qwen3-Next-80B-A3B-Instruct-4bit-SINQ`
35
+ - **Base Model:** [`Qwen3-Next-80B`](https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Instruct)
36
+ - **Task:** Text Generation
37
+ - **Framework:** PyTorch / Transformers
38
+ - **License:** [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0)
39
+ - **Quantized By:** *Huawei - Computing Systems Lab*
40
+
41
+
42
+ ## Quantization Details
43
+
44
+ - **Quantization Method:** SINQ (Sinkhorn-Normalized Quantization)
45
+ - **Precision:** INT3
46
+ - **Group Size:** 64
47
+ - **Framework:** PyTorch
48
+ - **Quantization Library:** `sinq`
49
+
50
+ ---
51
+
52
+ # 🚀 Usage</span>
53
+
54
+ ## Prerequisite
55
+ Before running the quantization script, make sure the **SINQ** library is installed.
56
+ Installation instructions and setup details are available in the [SINQ official github repository](https://github.com/huawei-csl/SINQ).
57
+
58
+ ## Usage example
59
+ You can load and use the model with our wrapper based on the 🤗 Transformers library:
60
+
61
+ ```python
62
+ from transformers import AutoTokenizer
63
+ from sinq.patch_model import AutoSINQHFModel
64
+
65
+ model_name = "huawei-csl/Qwen3-Next-80B-A3B-Instruct-3bit-SINQ"
66
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
67
+ sinq_model = AutoSINQHFModel.from_quantized_safetensors(
68
+ model_name,
69
+ device="cuda:0",
70
+ compute_dtype=torch.bfloat16
71
+ )
72
+
73
+ # prepare the model input
74
+ prompt = "Explain neural network quantization in one sentence."
75
+ messages = [
76
+ {"role": "user", "content": prompt},
77
+ ]
78
+ text = tokenizer.apply_chat_template(
79
+ messages,
80
+ tokenize=False,
81
+ add_generation_prompt=True,
82
+ )
83
+ model_inputs = tokenizer([text], return_tensors="pt").to(sinq_model.device)
84
+
85
+ # conduct text completion
86
+ generated_ids = sinq_model.generate(
87
+ **model_inputs,
88
+ temperature=0.7,
89
+ top_p=0.8,
90
+ top_k=20,
91
+ min_p=0.0,
92
+ max_new_tokens=16384,
93
+ )
94
+
95
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
96
+
97
+ content = tokenizer.decode(output_ids, skip_special_tokens=True)
98
+
99
+ print("content:", content)
100
+
101
+ ```
102
+
103
+ <details>
104
+ <summary><span style="font-size:1.1em; font-weight:bold;">🧩 Quantization Process</span></summary>
105
+
106
+ The quantized model was obtained using the **SINQ** quantization library, following the steps below:
107
+
108
+ ```python
109
+ from transformers import AutoModelForCausalLM, AutoTokenizer
110
+ from sinq.patch_model import AutoSINQHFModel
111
+ from sinq.sinqlinear import BaseQuantizeConfig
112
+
113
+ # Load base model
114
+ base_model_name = "Qwen/Qwen3-Next-80B-A3B-Instruct"
115
+ model = AutoModelForCausalLM.from_pretrained(base_model_name, torch_dtype="auto")
116
+ tokenizer = AutoTokenizer.from_pretrained(base_model_name)
117
+
118
+ # Apply 3-bit SINQ quantization
119
+ quant_cfg = BaseQuantizeConfig(
120
+ nbits=3, # quantization bit-width
121
+ group_size=64, # group size
122
+ tiling_mode="1D", # tiling strategy
123
+ method="sinq" # quantization method ("asinq" for the calibrated version)
124
+ )
125
+
126
+ sinq_model = AutoSINQHFModel.quantize_model(
127
+ model,
128
+ tokenizer=tokenizer,
129
+ quant_config=quant_cfg,
130
+ compute_dtype=torch.bfloat16,
131
+ device="cuda:0"
132
+ )
133
+ ```
134
+
135
+ > **Reproducibility Note**: This model was quantized using the SINQ implementation from commit [`ee1dc76`](https://github.com/huawei-csl/SINQ/commit/ee1dc767ba6dc4b819841c3f89be2f50719aa72d) of the [SINQ](https://github.com/huawei-csl/SINQ) repository.
136
+
137
+ </details>
138
+
139
+ </br>
140
+
141
+ ---
142
+
143
+ # 🧾 How to Cite This Work
144
+
145
+ If you find **SINQ** useful in your research or applications, please
146
+ - Put a star ⭐ in the official [SINQ](https://github.com/huawei-csl/SINQ) github repository.
147
+ - Cite our <a href="http://arxiv.org/abs/2509.22944" target="_blank"><strong>paper</strong></a>:
148
+
149
+ ```bibtex
150
+ @misc{muller2025sinq,
151
+ title={SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights},
152
+ author={Lorenz K. Muller and Philippe Bich and Jiawei Zhuang and Ahmet Celik and Luca Benfenati and Lukas Cavigelli},
153
+ year={2025},
154
+ eprint={2509.22944},
155
+ archivePrefix={arXiv},
156
+ primaryClass={cs.LG},
157
+ url={http://arxiv.org/abs/2509.22944}
158
+ }
159
+ ```