hbx commited on
Commit
73dd245
·
verified ·
1 Parent(s): cdaa358

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/fig1_aime24_curves_added.png filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,174 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ library_name: transformers
4
+ datasets:
5
+ - BytedTsinghua-SIA/DAPO-Math-17k
6
+ language:
7
+ - en
8
+ base_model:
9
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
10
+ pipeline_tag: text-generation
11
+ ---
12
+
13
+ <div align="center">
14
+ <span style="font-family: default; font-size: 1.5em;">AscentRL: Simplicity at Scale</span>
15
+ <div>
16
+ 🚀 Competitive RL Performance Without Complex Techniques 🌟
17
+ </div>
18
+ </div>
19
+
20
+ <br>
21
+
22
+ <div align="center" style="line-height: 1;">
23
+ <a href="https://github.com/HBX-hbx/AscentRL" style="margin: 2px;">
24
+ <img alt="Code" src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
25
+ </a>
26
+ <a href="[YOUR_BLOG_LINK]" target="_blank" style="margin: 2px;">
27
+ <img alt="Notion" src="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
28
+ </a>
29
+ </div>
30
+
31
+
32
+ </div>
33
+ </div>
34
+
35
+ ## Overview
36
+
37
+ **AscentRL** demonstrates that competitive reinforcement learning performance for small language models doesn't require complex multi-stage pipelines or dynamic schedules. Using a minimal recipe with single-stage training and fixed hyperparameters, we achieve state-of-the-art results on mathematical reasoning tasks.
38
+
39
+ We release two models:
40
+ - **AscentRL-DeepSeek-1.5B**: Trained from DeepSeek-R1-Distill-Qwen-1.5B
41
+ - **AscentRL-Nemotron-1.5B**: Trained from OpenMath-Nemotron-1.5B
42
+
43
+ Both models use identical hyperparameters without per-model tuning, demonstrating the robustness of our approach.
44
+
45
+ ![The AIME24 performance curve for scaling from a weak base DeekSeek-R1-Distill-Qwen-1.5B and a strong base OpenMath-Nemotron-1.5B over thousands of steps.](./assets/fig1_aime24_curves_added.png)
46
+
47
+ ## Key Highlights
48
+
49
+ ✨ **Simplicity**: Single-stage training with fixed hyperparameters, without multi-stage pipelines or dynamic schedules
50
+
51
+ 📈 **Stability**: Smooth, monotonic improvement over 4,000+ training steps without collapses or oscillations
52
+
53
+ 🎯 **Performance**: State-of-the-art results at 1.5B scale, matching or exceeding more complex approaches
54
+
55
+ 💰 **Efficiency**: Comparable or better performance with 2× less compute than multi-stage methods
56
+
57
+ 🔓 **Open**: Complete evaluation scripts, and model weights released
58
+
59
+ ## Performance
60
+
61
+ ### AscentRL-DeepSeek-1.5B (Based on DeepSeek-R1-Distill-Qwen-1.5B)
62
+
63
+ | Model | AIME24 (@32) | AIME25 (@32) | AMC23 (@32) | MATH-500 (@4) | Minerva (@4) | OlympiadBench (@4) | HMMT25 (@32) | BRUMO25 (@32) | CMIMC25 (@32) | Avg |
64
+ | ------------------------ | ------------ | ------------ | ----------- | ------------- | ------------ | ------------------ | ------------ | ------------- | ------------- | --------- |
65
+ | DeepSeek-R1-Distill-1.5B | 29.90 | 22.40 | 63.82 | 84.90 | 34.65 | 45.95 | 13.44 | 30.94 | 12.89 | 37.65 |
66
+ | DeepScaleR-1.5B-Preview | 40.21 | 28.65 | 73.83 | 89.30 | 39.34 | 52.79 | 18.96 | 40.00 | 21.00 | 44.88 |
67
+ | ProRL-V2 | 51.87 | 35.73 | 88.75 | 92.00 | 49.03 | **67.84** | 19.38 | 47.29 | **25.86** | 53.08 |
68
+ | BroRL | **57.50** | 36.88 | / | **92.14** | 49.08 | 61.54 | / | / | / | / |
69
+ | AscentRL-DeepSeek-1.5B | 52.29 | **37.19** | **91.02** | 91.55 | **51.47** | 66.77 | **21.98** | **52.71** | 25.63 | **54.51** |
70
+
71
+ Besides, the real question is whether our simplicity comes at a computational cost. It doesn't. We match half of ProRL-V2's compute budget while using a single-stage recipe with fixed hyperparameters. BroRL requires 4.9× more compute by increasing rollouts to 512 per example, essentially exhaustively exploring the solution space. Our approach achieves competitive performance without this computational overhead.
72
+
73
+ ### AscentRL-Nemotron-1.5B (Based on OpenMath-Nemotron-1.5B)
74
+
75
+ | Model | AIME24 (@32) | AIME25 (@32) | AMC23 (@32) | MATH-500 (@4) | Minerva (@4) | OlympiadBench (@4) | HMMT25 (@32) | BRUMO25 (@32) | CMIMC25 (@32) | Avg |
76
+ | ---------------------- | ------------ | ------------ | ----------- | ------------- | ------------ | ------------------ | ------------ | ------------- | ------------- | --------- |
77
+ | OpenMath-Nemotron-1.5B | 58.75 | 48.44 | 90.55 | 92.40 | 26.93 | 71.70 | 30.10 | 61.67 | 30.08 | 56.74 |
78
+ | QUESTA-Nemotron-1.5B | **71.56** | 62.08 | 93.44 | 92.95 | **32.08** | 72.28 | **40.94** | **67.50** | 41.48 | 63.81 |
79
+ | AscentRL-Nemotron-1.5B | 69.69 | **62.92** | **96.02** | **94.15** | 30.24 | **76.59** | 40.63 | 66.88 | **41.72** | **64.32** |
80
+
81
+ We achieve 64.32% average, slightly outperforming QuestA's 63.81% and leading on five of nine benchmarks. The gap is narrow, which makes sense—both approaches are pushing the boundaries of what's achievable at 1.5B scale. The key difference is in how we get there. We use 2× less compute while achieving slightly better average performance without designing a complex curriculum as used in QuestA.
82
+
83
+ ## Training Recipe
84
+
85
+ Our approach is deliberately minimal:
86
+
87
+ **Core Algorithm**: Standard GRPO with binary outcome rewards
88
+ - **Reward**: Simple DAPO verifier (string-matching, no SymPy)
89
+ - **Training**: Single-stage, no curriculum or stage transitions
90
+ - **Hyperparameters**: Fixed throughout (no adaptive schedules)
91
+ - **Data**: DAPO-Math-17k without filtering or dynamic sampling
92
+ - **Length Control**: 16K context cap (no explicit penalties)
93
+ - **Stabilization**: Only "clip higher" for gradient stability
94
+
95
+ Detail hyperparameters and comparisons on training techniques with other methods can refer to our blog.
96
+
97
+ ## Training Data
98
+
99
+ We train on [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k), a curated dataset of mathematical problems. **No offline difficulty filtering or online dynamic sampling is used.**
100
+
101
+ ## Usage
102
+
103
+ ### Basic Inference
104
+ ```python
105
+ from transformers import AutoModelForCausalLM, AutoTokenizer
106
+
107
+ model_name = "hbx/AscentRL-Nemotron-1.5B" # or AscentRL-DeepSeek-1.5B
108
+ model = AutoModelForCausalLM.from_pretrained(
109
+ model_name,
110
+ torch_dtype="auto",
111
+ device_map="auto"
112
+ )
113
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
114
+
115
+ prompt = """<problem>
116
+
117
+ Please reason step by step, and put your final answer within \\boxed{}."""
118
+
119
+ messages = [{"role": "user", "content": prompt}]
120
+ text = tokenizer.apply_chat_template(
121
+ messages,
122
+ tokenize=False,
123
+ add_generation_prompt=True
124
+ )
125
+
126
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
127
+ outputs = model.generate(
128
+ **inputs,
129
+ max_new_tokens=16384,
130
+ temperature=0.7,
131
+ top_p=0.9,
132
+ do_sample=True
133
+ )
134
+
135
+ response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
136
+ print(response)
137
+ ```
138
+
139
+ ### Batch Inference with vLLM
140
+ ```python
141
+ from vllm import LLM, SamplingParams
142
+
143
+ llm = LLM(
144
+ model="hbx/AscentRL-Nemotron-1.5B",
145
+ tensor_parallel_size=1,
146
+ max_model_len=32768
147
+ )
148
+
149
+ sampling_params = SamplingParams(
150
+ temperature=0.7,
151
+ top_p=0.9,
152
+ max_tokens=16384,
153
+ )
154
+
155
+ problems = [...] # Your list of problems
156
+ responses = llm.generate(problems, sampling_params)
157
+ ```
158
+
159
+ ## Reproduction
160
+
161
+ We provide evaluation scripts based on [POLARIS](https://github.com/ChenxinAn-fdu/POLARIS), the evaluation script is [TODO](TODO).
162
+
163
+ ## Citation
164
+
165
+ ```bibtex
166
+ @misc{he2025ascentrl,
167
+ title = {TODO},
168
+ author = {TODO},
169
+ year = {2025},
170
+ month = {Nov},
171
+ day = {1},
172
+ note = {First published on Notion},
173
+ url = {https://TODO}
174
+ }
assets/fig1_aime24_curves_added.png ADDED

Git LFS Details

  • SHA256: 3fb93b4ec962967c62fb4a8d720a0936d321b470757473c69246633a199d315d
  • Pointer size: 131 Bytes
  • Size of remote file: 338 kB
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 151643,
7
+ "eos_token_id": 151645,
8
+ "hidden_act": "silu",
9
+ "hidden_size": 1536,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 8960,
12
+ "max_position_embeddings": 131072,
13
+ "max_window_layers": 21,
14
+ "model_type": "qwen2",
15
+ "num_attention_heads": 12,
16
+ "num_hidden_layers": 28,
17
+ "num_key_value_heads": 2,
18
+ "rms_norm_eps": 1e-06,
19
+ "rope_scaling": null,
20
+ "rope_theta": 500000.0,
21
+ "sliding_window": null,
22
+ "tie_word_embeddings": true,
23
+ "torch_dtype": "bfloat16",
24
+ "transformers_version": "4.51.3",
25
+ "use_cache": true,
26
+ "use_sliding_window": false,
27
+ "vocab_size": 151936
28
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 151643,
4
+ "eos_token_id": 151645,
5
+ "transformers_version": "4.47.1"
6
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b324dd08c79397c9c651511f75573d795716ecc0ea35f16fdc31b33dca0aa19c
3
+ size 3554214752
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
3
+ size 11421896
tokenizer_config.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}\n{%- else %}\n {{- '<|im_start|>system\n<|im_end|>\n' }}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == 'user') or (message.role == 'system' and not loop.first) or (message.role == 'assistant') %}\n {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\n' }}\n{%- endif %}",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "extra_special_tokens": {},
203
+ "model_max_length": 131072,
204
+ "pad_token": "<|endoftext|>",
205
+ "split_special_tokens": false,
206
+ "tokenizer_class": "Qwen2Tokenizer",
207
+ "unk_token": null
208
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff