MetaStoneTec
/

MetaStone-S1-32B

Improve model card: Add metadata, links, and correct formatting

by nielsr HF Staff - opened Jul 6

←

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,6 +1,18 @@
 ---
 license: apache-2.0
 ---
 ## Introduction
 We release our first reflective generative model: MetaStone-S1.
 With only 32B parameters, MetaStone-S1 performs comparably to the OpenAI-o3 series on mathematics, coding, and Chinese reasoning tasks.
@@ -12,7 +24,7 @@ By sharing the backbone network between the PRMs and policy models, MetaStone‑
 <img src="./figures/intro.jpg" alt="Introduction" width="800">
-This repo contains the training and evaluation code of MetaStone-S1. For full details please refer to our [paper](https://arxiv.org/abs/2507.01951) and [our official website](https://www.wenxiaobai.com/).
 ## Performance
@@ -36,8 +48,6 @@ Since the base model used for this repo is QwQ-32B, we chose the contemporary De
 | **MetaStone-S1-32B-high** | **85.2** | <ins>73.6</ins>   | 64.2           | <ins>89.7</ins>   |
-## Model
 ## Model
 We save the parameters of the policy model and the SPRM head into two files:

 ---
 license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+  - qwen2
+  - math
+  - code
+  - reasoning
 ---
+# MetaStone-S1: Test-Time Scaling with Reflective Generative Model
+📚 [Paper](https://huggingface.co/papers/2507.01951) | 🌐 [Project Page](https://www.wenxiaobai.com/) | 💻 [GitHub Repository](https://github.com/MetaStone-AI/MetaStone-S1)
 ## Introduction
 We release our first reflective generative model: MetaStone-S1.
 With only 32B parameters, MetaStone-S1 performs comparably to the OpenAI-o3 series on mathematics, coding, and Chinese reasoning tasks.
 <img src="./figures/intro.jpg" alt="Introduction" width="800">
+This repo contains the training and evaluation code of MetaStone-S1. For full details please refer to our [paper](https://huggingface.co/papers/2507.01951) and [our official website](https://www.wenxiaobai.com/).
 ## Performance
 | **MetaStone-S1-32B-high** | **85.2** | <ins>73.6</ins>   | 64.2           | <ins>89.7</ins>   |
 ## Model
 We save the parameters of the policy model and the SPRM head into two files: