--- language: - en - zh license: apache-2.0 pipeline_tag: text-generation tags: - reasoning - small-language-model - efficient-training - xmodel - xiaoduo-ai library_name: transformers --- # Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
[![hf_space](https://img.shields.io/badge/🤗-Xiaoduo%20HuggingFace-blue.svg)](https://huggingface.co/XiaoduoAILab/Xmodel-2.5) [![arXiv](https://img.shields.io/badge/Arxiv-2511.19496-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2511.19496) [![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/XiaoduoAILab/Xmodel-2.5/blob/main/LICENSE) [![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/XiaoduoAILab/Xmodel-2.5) [![github](https://img.shields.io/github/stars/XiaoduoAILab/Xmodel-2.5.svg?style=social)](https://github.com/XiaoduoAILab/Xmodel-2.5)
## Model Description Xmodel-2.5 is a 1.3 billion parameter small language model specifically designed as a **lightweight agent core** for complex reasoning tasks. The model builds upon Xmodel-2 with four key upgrades: 1. **Full μP Support**: Extended Megatron-LM to support maximal update parameterization for reliable hyperparameter transfer 2. **Efficient Tokenizer**: Adopted 129K token DeepSeek-v3 tokenizer for improved compression rate and decoding speed 3. **FP8 Mixed Precision**: Used E4M3 forward and E5M2 backward FP8 formats to balance precision and throughput 4. **Optimizer Scheduling**: Switched from AdamW to Muon during decay phase, significantly improving downstream task performance Trained with only 1.4T tokens, Xmodel-2.5 achieves **52.49%** average accuracy across 13 reasoning benchmarks, ranking second among 1-2B parameter models, only behind Qwen3 (56.96%) but with 25.7x fewer training tokens. ## Model Architecture | Hyperparameter | Value | |----------------|-------| | Hidden size | 1536 | | Intermediate size | 3840 | | Transformer layers | 48 | | Attention heads (Q) | 24 | | KV heads (GQA) | 8 | | Sequence length | 3712 | | Max position embeddings | 131072 | | RoPE base | 500000 | ## Intended Uses & Limitations ### Intended Uses - Complex reasoning tasks - Lightweight AI agent applications - Educational and research purposes - Resource-constrained environments ### Limitations - Limited to 1.3B parameter capacity - May struggle with highly specialized domains - Performance may vary on non-English languages ## Training Details ### Training Strategy - **Three-stage WSD curriculum**: 560k steps, 1.4T tokens - **Warmup phase**: 2k steps, linear learning rate increase - **Stable phase**: 530k steps, gradually increasing batch size - **Decay phase**: 20k steps, mixing 66.9% high-quality SFT data - **Long-context adaptation**: 10k additional steps for 16K context support ### Key Innovations - **μP hyperparameter transfer**: Direct transfer from 20M parameter proxy model to full model - **Optimizer switching**: AdamW → Muon during decay phase for improved reasoning performance - **FP8 mixed precision**: FP8 format significantly enhances training efficiency ## Performance ### Comprehensive Reasoning Performance | Model | Parameters | Training Tokens | 13-Task Average | |-------|------------|-----------------|------------------| | Qwen3-1.7B | 1.7B | 36T | 56.96% | | **Xmodel-2.5** | **1.3B** | **1.4T** | **52.49%** | | InternLM2.5-1.8B | 1.8B | - | 50.19% | | Xmodel-2-1.2B | 1.2B | 1.5T | 50.34% | | MiniCPM-1B | 1B | - | 48.95% | | SmolLM2-1.7B | 1.7B | 11T | 46.88% | | Llama-3.2-1B | 1B | 9T | 44.72% | ### Detailed Task Performance | Task | Xmodel-2.5 | Xmodel-2 | Improvement | |------|------------|----------|-------------| | ARC-Challenge | 48.89 | 46.16 | +2.73 | | ARC-Easy | 76.94 | 76.22 | +0.72 | | PIQA | 75.95 | 75.14 | +0.81 | | HellaSwag | 67.24 | 64.05 | +3.19 | | WinoGrande | 64.64 | 64.25 | +0.39 | | BBH | 54.58 | 48.90 | +5.68 | | MMLU | 51.81 | 49.98 | +1.83 | | GSM8k | 58.98 | 56.56 | +2.42 | | MATH | 28.94 | 25.64 | +3.30 | | HumanEval | 28.66 | 29.27 | -0.61 | | MBPP | 33.00 | 30.80 | +2.20 | | CMMLU | 47.16 | 44.29 | +2.87 | | C-Eval | 45.54 | 43.16 | +2.38 | ## How to Use ```python from transformers import AutoModelForCausalLM, AutoTokenizer import os model_path = "XiaoduoAILab/Xmodel-2.5" model = AutoModelForCausalLM.from_pretrained( model_path, torch_dtype="auto", device_map="auto", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( model_path, trust_remote_code=True ) prompt = "Explain the concept of transfer learning in machine learning." messages = [{"role": "user", "content": prompt}] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) model_inputs = tokenizer(text, return_tensors="pt").to(model.device) # Generation configuration generated_ids = model.generate( **model_inputs, max_new_tokens=512, do_sample=True, top_p=0.9, temperature=0.7, pad_token_id=tokenizer.eos_token_id ) output = tokenizer.decode( generated_ids[0][len(model_inputs.input_ids[0]):], skip_special_tokens=True ) print("Generated Response:") print(output) ``` ## Citation If you find Xmodel-2.5 useful for your research or applications, please consider citing our work: ```bibtex @misc{liu2025xmodel25, title={Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM}, author={Yang Liu and Xiaolong Zhong and Ling Jiang}, year={2025}, eprint={2511.19496}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2511.19496}, } ``` ## Contact For questions or suggestions, please contact us through: - GitHub Issues: [Xmodel-2.5 Issues](https://github.com/XiaoduoAILab/Xmodel-2.5/issues) - Email: foamilu@yeah.net ## License This project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for details.