foamliu commited on
Commit
b3e530f
·
verified ·
1 Parent(s): 7941b17

Update README.md

Browse files

improve model card

Files changed (1) hide show
  1. README.md +152 -121
README.md CHANGED
@@ -1,149 +1,180 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  - zh
 
6
  tags:
7
  - reasoning
8
- - lightweight
9
- - agent
10
- - 1.3B
11
- - SLM
12
- ---
13
-
14
- # Xmodel-2.5: 1.3B Data-Efficient Reasoning Small Language Model
15
-
16
- > 在 1-2B 量级上实现 **SOTA 平均推理成绩**(52.49%,13 基准,仅次 Qwen3-1.7B),但只用了 **1.4T token(≈ Qwen3 的 4%)** 与 **1.3B 参数(↓25%)**。
17
- > 为边缘、成本敏感场景打造的「即插即用」Agent 核心。
18
-
19
  ---
20
 
21
- ## 🧠 关键亮点
22
-
23
- | 特性 | 数值 / 描述 |
24
- |---|---|
25
- | **参数量** | 1.3 B(decoder-only,深-窄结构) |
26
- | **训练 token** | 1.4 T(Warmup-Stable-Decay 三阶段) |
27
- | **上下文长度** | 16 k(RoPE-500k 底座,131k 位置编码) |
28
- | **精度** | FP8-mixed(吞吐↑30%,无损精度) |
29
- | **优化器** | 前 540k 步 AdamW → 后 20k 步 Muon(+4.58% 推理平均) |
30
- | **分词器** | DeepSeek-v3 129k(压缩率↑) |
31
- | **μP 迁移** | 26 M 代理模型零调参直达 1.3 B,训练动态一致 |
32
- | **开源协议** | Apache-2.0(代码/权重/训练日志全放) |
33
-
34
- ---
35
-
36
- ## 📊 推理基准(1-shot / few-shot)
37
-
38
- | 任务 | 指标 | Xmodel-2.5 | Qwen3-1.7B | MiniCPM-1B |
39
- |---|---|---|---|---|
40
- | GSM8k | 5-shot EM | **58.98** | 69.29 | 42.00 |
41
- | MATH | 4-shot | **28.94** | 35.50 | 12.06 |
42
- | BBH | 3-shot | **54.58** | 45.23 | 35.45 |
43
- | MMLU | 5-shot | **51.81** | 60.24 | 48.75 |
44
- | ARC-c | 25-shot | **48.89** | 53.07 | 45.31 |
45
- | **13 任务平均** | — | **52.49** | 56.96 | 48.95 |
46
-
47
- > 在 **1-2B 俱乐部** 中排名第二,**每参数推理效率第一**。
48
-
49
- ---
50
-
51
- ## 🚀 快速开始
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  ```python
54
- from transformers import AutoTokenizer, AutoModelForCausalLM
55
- import torch
56
 
57
- tok = AutoTokenizer.from_pretrained("XiaoduoAILab/Xmodel-2.5", trust_remote_code=True)
58
  model = AutoModelForCausalLM.from_pretrained(
59
- "XiaoduoAILab/Xmodel-2.5",
60
- torch_dtype=torch.bfloat16,
61
- device_map="auto"
 
 
 
 
 
62
  )
63
 
64
- prompt = "Solve: A bookstore has 240 books. If they sell 15% of them, how many are left?"
65
- inputs = tok(prompt, return_tensors="pt").to(model.device)
66
- out = model.generate(**inputs, max_new_tokens=128, do_sample=False)
67
- print(tok.decode(out[0], skip_special_tokens=True))
68
- ```
69
-
70
- ---
71
-
72
- ## 🛠️ 模型细节
73
-
74
- | 配置 | 值 |
75
- |---|---|
76
- | hidden_size | 1536 |
77
- | num_layers | 48 |
78
- | attention_heads | 24 (Q) / 8 (KV, GQA) |
79
- | intermediate_size | 3840 |
80
- | max_position_embeddings | 131 072 |
81
- | RoPE base | 500 k |
82
- | 训练长度 | 3 712 → 8 192 → 16 384(渐进拉伸) |
83
 
84
- ---
85
 
86
- ## 📁 仓库内容
 
 
 
 
 
 
 
 
87
 
 
 
 
 
 
 
88
  ```
89
- .
90
- ├── README.md # 本文件
91
- ├── config.json # HuggingFace 标准配置
92
- ├── tokenizer.json # DeepSeek-v3 129k
93
- ├── pytorch_model.bin # 1.3B 权重(bf16)
94
- ├── training/ # 训练日志、loss 曲线
95
- ├── evaluation/ # 13 基准评测脚本
96
- └── mup/ # μP 参数表与代理模型
97
- ```
98
-
99
- ---
100
-
101
- ## 🧪 复现 & 微调
102
-
103
- 1. 安装环境
104
- ```bash
105
- pip install transformers>=4.40 datasets flash-attn>=2.5
106
- ```
107
-
108
- 2. 继续预训练 / 长上下文拉伸
109
- ```bash
110
- bash scripts/continue_pretrain.sh {data_path} {save_path}
111
- ```
112
 
113
- 3. 下游微调示例(LoRA)
114
- ```bash
115
- bash scripts/lora_sft.sh {instruction_data.jsonl}
116
- ```
117
 
118
- > 所有脚本均适配 Megatron-LM + Transformer-Engine FP8。
119
-
120
- ---
121
-
122
- ## 📈 训练曲线
123
-
124
- ![WikiText-2 Loss](https://huggingface.co/XiaoduoAILab/Xmodel-2.5/resolve/main/loss_curve.png)
125
- *图:三阶段 WSD + 长上下文适配,WikiText-2 perplexity 与 13 任务平均同步下降。*
126
-
127
- ---
128
-
129
- ## 📜 引用
130
 
131
  ```bibtex
132
- @misc{xmodel25,
133
- title={Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM},
134
- author={Yang Liu and Xiaolong Zhong and Ling Jiang},
135
- year={2025},
136
- publisher={Xiaoduo AI Lab},
137
- url={https://huggingface.co/XiaoduoAILab/Xmodel-2.5}
 
 
138
  }
139
  ```
140
 
141
- ---
142
 
143
- ## 🤝 致谢
 
 
144
 
145
- 采用 [μP](https://github.com/microsoft/mup)、[Muon](https://github.com/KellerJordan/Muon)、[Transformer-Engine](https://github.com/NVIDIA/TransformerEngine) 等开源方案,感谢社区贡献。
146
-
147
- ---
148
 
149
- 💬 如有问题,请在 [Discussions](https://huggingface.co/XiaoduoAILab/Xmodel-2.5/discussions) 留言,或提 [Issue](https://github.com/XiaoduoAILab/Xmodel-2.5/issues)。
 
1
  ---
 
2
  language:
3
  - en
4
  - zh
5
+ license: apache-2.0
6
  tags:
7
  - reasoning
8
+ - small-language-model
9
+ - efficient-training
10
+ - xmodel
11
+ - xiaoduo-ai
12
+ pipeline_tag: text-generation
 
 
 
 
 
 
13
  ---
14
 
15
+ # Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
16
+
17
+ <h5 align="center">
18
+
19
+ [![hf_space](https://img.shields.io/badge/🤗-Xiaoduo%20HuggingFace-blue.svg)](https://huggingface.co/XiaoduoAILab/Xmodel-2.5)
20
+ [![arXiv](https://img.shields.io/badge/Arxiv-2511.19496-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2511.19496)
21
+ [![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/XiaoduoAILab/Xmodel-2.5/blob/main/LICENSE)
22
+ [![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/XiaoduoAILab/Xmodel-2.5)
23
+ [![github](https://img.shields.io/github/stars/XiaoduoAILab/Xmodel-2.5.svg?style=social)](https://github.com/XiaoduoAILab/Xmodel-2.5)
24
+
25
+ </h5>
26
+
27
+ ## Model Description
28
+
29
+ Xmodel-2.5 is a 1.3 billion parameter small language model specifically designed as a **lightweight agent core** for complex reasoning tasks. The model builds upon Xmodel-2 with four key upgrades:
30
+
31
+ 1. **Full μP Support**: Extended Megatron-LM to support maximal update parameterization for reliable hyperparameter transfer
32
+ 2. **Efficient Tokenizer**: Adopted 129K token DeepSeek-v3 tokenizer for improved compression rate and decoding speed
33
+ 3. **FP8 Mixed Precision**: Used E4M3 forward and E5M2 backward FP8 formats to balance precision and throughput
34
+ 4. **Optimizer Scheduling**: Switched from AdamW to Muon during decay phase, significantly improving downstream task performance
35
+
36
+ Trained with only 1.4T tokens, Xmodel-2.5 achieves **52.49%** average accuracy across 13 reasoning benchmarks, ranking second among 1-2B parameter models, only behind Qwen3 (56.96%) but with 25.7x fewer training tokens.
37
+
38
+ ## Model Architecture
39
+
40
+ | Hyperparameter | Value |
41
+ |----------------|-------|
42
+ | Hidden size | 1536 |
43
+ | Intermediate size | 3840 |
44
+ | Transformer layers | 48 |
45
+ | Attention heads (Q) | 24 |
46
+ | KV heads (GQA) | 8 |
47
+ | Sequence length | 3712 |
48
+ | Max position embeddings | 131072 |
49
+ | RoPE base | 500000 |
50
+
51
+ ## Intended Uses & Limitations
52
+
53
+ ### Intended Uses
54
+ - Complex reasoning tasks
55
+ - Lightweight AI agent applications
56
+ - Educational and research purposes
57
+ - Resource-constrained environments
58
+
59
+ ### Limitations
60
+ - Limited to 1.3B parameter capacity
61
+ - May struggle with highly specialized domains
62
+ - Performance may vary on non-English languages
63
+
64
+ ## Training Details
65
+
66
+ ### Training Strategy
67
+ - **Three-stage WSD curriculum**: 560k steps, 1.4T tokens
68
+ - **Warmup phase**: 2k steps, linear learning rate increase
69
+ - **Stable phase**: 530k steps, gradually increasing batch size
70
+ - **Decay phase**: 20k steps, mixing 66.9% high-quality SFT data
71
+ - **Long-context adaptation**: 10k additional steps for 16K context support
72
+
73
+ ### Key Innovations
74
+ - **μP hyperparameter transfer**: Direct transfer from 20M parameter proxy model to full model
75
+ - **Optimizer switching**: AdamW → Muon during decay phase for improved reasoning performance
76
+ - **FP8 mixed precision**: FP8 format significantly enhances training efficiency
77
+
78
+ ## Performance
79
+
80
+ ### Comprehensive Reasoning Performance
81
+
82
+ | Model | Parameters | Training Tokens | 13-Task Average |
83
+ |-------|------------|-----------------|------------------|
84
+ | Qwen3-1.7B | 1.7B | 36T | 56.96% |
85
+ | **Xmodel-2.5** | **1.3B** | **1.4T** | **52.49%** |
86
+ | InternLM2.5-1.8B | 1.8B | - | 50.19% |
87
+ | Xmodel-2-1.2B | 1.2B | 1.5T | 50.34% |
88
+ | MiniCPM-1B | 1B | - | 48.95% |
89
+ | SmolLM2-1.7B | 1.7B | 11T | 46.88% |
90
+ | Llama-3.2-1B | 1B | 9T | 44.72% |
91
+
92
+ ### Detailed Task Performance
93
+
94
+ | Task | Xmodel-2.5 | Xmodel-2 | Improvement |
95
+ |------|------------|----------|-------------|
96
+ | ARC-Challenge | 48.89 | 46.16 | +2.73 |
97
+ | ARC-Easy | 76.94 | 76.22 | +0.72 |
98
+ | PIQA | 75.95 | 75.14 | +0.81 |
99
+ | HellaSwag | 67.24 | 64.05 | +3.19 |
100
+ | WinoGrande | 64.64 | 64.25 | +0.39 |
101
+ | BBH | 54.58 | 48.90 | +5.68 |
102
+ | MMLU | 51.81 | 49.98 | +1.83 |
103
+ | GSM8k | 58.98 | 56.56 | +2.42 |
104
+ | MATH | 28.94 | 25.64 | +3.30 |
105
+ | HumanEval | 28.66 | 29.27 | -0.61 |
106
+ | MBPP | 33.00 | 30.80 | +2.20 |
107
+ | CMMLU | 47.16 | 44.29 | +2.87 |
108
+ | C-Eval | 45.54 | 43.16 | +2.38 |
109
+
110
+ ## How to Use
111
 
112
  ```python
113
+ from transformers import AutoModelForCausalLM, AutoTokenizer
114
+ import os
115
 
116
+ model_path = "XiaoduoAILab/Xmodel-2.5"
117
  model = AutoModelForCausalLM.from_pretrained(
118
+ model_path,
119
+ torch_dtype="auto",
120
+ device_map="auto",
121
+ trust_remote_code=True
122
+ )
123
+ tokenizer = AutoTokenizer.from_pretrained(
124
+ model_path,
125
+ trust_remote_code=True
126
  )
127
 
128
+ prompt = "Explain the concept of transfer learning in machine learning."
129
+ messages = [{"role": "user", "content": prompt}]
130
+ text = tokenizer.apply_chat_template(
131
+ messages,
132
+ tokenize=False,
133
+ add_generation_prompt=True
134
+ )
 
 
 
 
 
 
 
 
 
 
 
 
135
 
136
+ model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
137
 
138
+ # Generation configuration
139
+ generated_ids = model.generate(
140
+ **model_inputs,
141
+ max_new_tokens=512,
142
+ do_sample=True,
143
+ top_p=0.9,
144
+ temperature=0.7,
145
+ pad_token_id=tokenizer.eos_token_id
146
+ )
147
 
148
+ output = tokenizer.decode(
149
+ generated_ids[0][len(model_inputs.input_ids[0]):],
150
+ skip_special_tokens=True
151
+ )
152
+ print("Generated Response:")
153
+ print(output)
154
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
 
156
+ ## Citation
 
 
 
157
 
158
+ If you find Xmodel-2.5 useful for your research or applications, please consider citing our work:
 
 
 
 
 
 
 
 
 
 
 
159
 
160
  ```bibtex
161
+ @misc{liu2025xmodel25,
162
+ title={Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM},
163
+ author={Yang Liu and Xiaolong Zhong and Ling Jiang},
164
+ year={2025},
165
+ eprint={2511.19496},
166
+ archivePrefix={arXiv},
167
+ primaryClass={cs.LG},
168
+ url={https://arxiv.org/abs/2511.19496},
169
  }
170
  ```
171
 
172
+ ## Contact
173
 
174
+ For questions or suggestions, please contact us through:
175
+ - GitHub Issues: [Xmodel-2.5 Issues](https://github.com/XiaoduoAILab/Xmodel-2.5/issues)
176
+ - Email: [email protected]
177
 
178
+ ## License
 
 
179
 
180
+ This project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for details.