Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

.gitattributes +2 -0
README.md +174 -3
assets/fig1_aime24_curves_added.png +3 -0
config.json +28 -0
generation_config.json +6 -0
merges.txt +0 -0
model.safetensors +3 -0
tokenizer.json +3 -0
tokenizer_config.json +208 -0
vocab.json +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/fig1_aime24_curves_added.png filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,3 +1,174 @@
----
-license: mit
----

+---
+license: mit
+library_name: transformers
+datasets:
+- BytedTsinghua-SIA/DAPO-Math-17k
+language:
+- en
+base_model:
+- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
+pipeline_tag: text-generation
+---
+<div align="center">
+<span style="font-family: default; font-size: 1.5em;">AscentRL: Simplicity at Scale</span>
+<div>
+🚀 Competitive RL Performance Without Complex Techniques 🌟
+</div>
+</div>
+<br>
+<div align="center" style="line-height: 1;">
+  <a href="https://github.com/HBX-hbx/AscentRL" style="margin: 2px;">
+    <img alt="Code" src="https://img.shields.io/badge/GitHub-100000?style=for-the-badge&logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+  <a href="[YOUR_BLOG_LINK]" target="_blank" style="margin: 2px;">
+    <img alt="Notion" src="https://img.shields.io/badge/Notion-%23000000.svg?style=for-the-badge&logo=notion&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
+  </a>
+</div>
+</div>
+</div>
+## Overview
+**AscentRL** demonstrates that competitive reinforcement learning performance for small language models doesn't require complex multi-stage pipelines or dynamic schedules. Using a minimal recipe with single-stage training and fixed hyperparameters, we achieve state-of-the-art results on mathematical reasoning tasks.
+We release two models:
+- **AscentRL-DeepSeek-1.5B**: Trained from DeepSeek-R1-Distill-Qwen-1.5B
+- **AscentRL-Nemotron-1.5B**: Trained from OpenMath-Nemotron-1.5B
+Both models use identical hyperparameters without per-model tuning, demonstrating the robustness of our approach.
+![The AIME24 performance curve for scaling from a weak base DeekSeek-R1-Distill-Qwen-1.5B and a strong base OpenMath-Nemotron-1.5B over thousands of steps.](./assets/fig1_aime24_curves_added.png)
+## Key Highlights
+✨ **Simplicity**: Single-stage training with fixed hyperparameters, without multi-stage pipelines or dynamic schedules
+📈 **Stability**: Smooth, monotonic improvement over 4,000+ training steps without collapses or oscillations
+🎯 **Performance**: State-of-the-art results at 1.5B scale, matching or exceeding more complex approaches
+💰 **Efficiency**: Comparable or better performance with 2× less compute than multi-stage methods
+🔓 **Open**: Complete evaluation scripts, and model weights released
+## Performance
+### AscentRL-DeepSeek-1.5B (Based on DeepSeek-R1-Distill-Qwen-1.5B)
+| Model                    | AIME24 (@32) | AIME25 (@32) | AMC23 (@32) | MATH-500 (@4) | Minerva (@4) | OlympiadBench (@4) | HMMT25 (@32) | BRUMO25 (@32) | CMIMC25 (@32) | Avg       |
+| ------------------------ | ------------ | ------------ | ----------- | ------------- | ------------ | ------------------ | ------------ | ------------- | ------------- | --------- |
+| DeepSeek-R1-Distill-1.5B | 29.90        | 22.40        | 63.82       | 84.90         | 34.65        | 45.95              | 13.44        | 30.94         | 12.89         | 37.65     |
+| DeepScaleR-1.5B-Preview  | 40.21        | 28.65        | 73.83       | 89.30         | 39.34        | 52.79              | 18.96        | 40.00         | 21.00         | 44.88     |
+| ProRL-V2                 | 51.87        | 35.73        | 88.75       | 92.00         | 49.03        | **67.84**          | 19.38        | 47.29         | **25.86**     | 53.08     |
+| BroRL                    | **57.50**    | 36.88        | /           | **92.14**     | 49.08        | 61.54              | /            | /             | /             | /         |
+| AscentRL-DeepSeek-1.5B   | 52.29        | **37.19**    | **91.02**   | 91.55         | **51.47**    | 66.77              | **21.98**    | **52.71**     | 25.63         | **54.51** |
+Besides, the real question is whether our simplicity comes at a computational cost. It doesn't. We match half of ProRL-V2's compute budget while using a single-stage recipe with fixed hyperparameters. BroRL requires 4.9× more compute by increasing rollouts to 512 per example, essentially exhaustively exploring the solution space. Our approach achieves competitive performance without this computational overhead.
+### AscentRL-Nemotron-1.5B (Based on OpenMath-Nemotron-1.5B)
+| Model                  | AIME24 (@32) | AIME25 (@32) | AMC23 (@32) | MATH-500 (@4) | Minerva (@4) | OlympiadBench (@4) | HMMT25 (@32) | BRUMO25 (@32) | CMIMC25 (@32) | Avg       |
+| ---------------------- | ------------ | ------------ | ----------- | ------------- | ------------ | ------------------ | ------------ | ------------- | ------------- | --------- |
+| OpenMath-Nemotron-1.5B | 58.75        | 48.44        | 90.55       | 92.40         | 26.93        | 71.70              | 30.10        | 61.67         | 30.08         | 56.74     |
+| QUESTA-Nemotron-1.5B   | **71.56**    | 62.08        | 93.44       | 92.95         | **32.08**    | 72.28              | **40.94**    | **67.50**     | 41.48         | 63.81     |
+| AscentRL-Nemotron-1.5B | 69.69        | **62.92**    | **96.02**   | **94.15**     | 30.24        | **76.59**          | 40.63        | 66.88         | **41.72**     | **64.32** |
+We achieve 64.32% average, slightly outperforming QuestA's 63.81% and leading on five of nine benchmarks. The gap is narrow, which makes sense—both approaches are pushing the boundaries of what's achievable at 1.5B scale. The key difference is in how we get there. We use 2× less compute while achieving slightly better average performance without designing a complex curriculum as used in QuestA.
+## Training Recipe
+Our approach is deliberately minimal:
+**Core Algorithm**: Standard GRPO with binary outcome rewards
+- **Reward**: Simple DAPO verifier (string-matching, no SymPy)
+- **Training**: Single-stage, no curriculum or stage transitions
+- **Hyperparameters**: Fixed throughout (no adaptive schedules)
+- **Data**: DAPO-Math-17k without filtering or dynamic sampling
+- **Length Control**: 16K context cap (no explicit penalties)
+- **Stabilization**: Only "clip higher" for gradient stability
+Detail hyperparameters and comparisons on training techniques with other methods can refer to our blog.
+## Training Data
+We train on [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k), a curated dataset of mathematical problems. **No offline difficulty filtering or online dynamic sampling is used.**
+## Usage
+### Basic Inference
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "hbx/AscentRL-Nemotron-1.5B"  # or AscentRL-DeepSeek-1.5B
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype="auto",
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+prompt = """<problem>
+Please reason step by step, and put your final answer within \\boxed{}."""
+messages = [{"role": "user", "content": prompt}]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=16384,
+    temperature=0.7,
+    top_p=0.9,
+    do_sample=True
+)
+response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)
+print(response)
+```
+### Batch Inference with vLLM
+```python
+from vllm import LLM, SamplingParams
+llm = LLM(
+    model="hbx/AscentRL-Nemotron-1.5B",
+    tensor_parallel_size=1,
+    max_model_len=32768
+)
+sampling_params = SamplingParams(
+    temperature=0.7,
+    top_p=0.9,
+    max_tokens=16384,
+)
+problems = [...]  # Your list of problems
+responses = llm.generate(problems, sampling_params)
+```
+## Reproduction
+We provide evaluation scripts based on [POLARIS](https://github.com/ChenxinAn-fdu/POLARIS), the evaluation script is [TODO](TODO).
+## Citation
+```bibtex
+@misc{he2025ascentrl,
+  title        = {TODO},
+  author       = {TODO},
+  year         = {2025},
+  month        = {Nov},
+  day          = {1},
+  note         = {First published on Notion},
+  url          = {https://TODO}
+}

assets/fig1_aime24_curves_added.png ADDED Viewed

Git LFS Details

SHA256: 3fb93b4ec962967c62fb4a8d720a0936d321b470757473c69246633a199d315d
Pointer size: 131 Bytes
Size of remote file: 338 kB

config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "architectures": [
+    "Qwen2ForCausalLM"
+  ],
+  "attention_dropout": 0.0,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "hidden_act": "silu",
+  "hidden_size": 1536,
+  "initializer_range": 0.02,
+  "intermediate_size": 8960,
+  "max_position_embeddings": 131072,
+  "max_window_layers": 21,
+  "model_type": "qwen2",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 28,
+  "num_key_value_heads": 2,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": null,
+  "rope_theta": 500000.0,
+  "sliding_window": null,
+  "tie_word_embeddings": true,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.51.3",
+  "use_cache": true,
+  "use_sliding_window": false,
+  "vocab_size": 151936
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 151643,
+  "eos_token_id": 151645,
+  "transformers_version": "4.47.1"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b324dd08c79397c9c651511f75573d795716ecc0ea35f16fdc31b33dca0aa19c
+size 3554214752

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c5ae00e602b8860cbd784ba82a8aa14e8feecec692e7076590d014d7b7fdafa
+size 11421896

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,208 @@

+{
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{%- if messages[0]['role'] == 'system' %}\n    {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}\n{%- else %}\n    {{- '<|im_start|>system\n<|im_end|>\n' }}\n{%- endif %}\n{%- for message in messages %}\n    {%- if (message.role == 'user') or (message.role == 'system' and not loop.first) or (message.role == 'assistant') %}\n        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\n' }}\n{%- endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 131072,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff