Improve model card: Add pipeline tag, library name, and enrich content

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +55 -19
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
 
5
  tags:
6
  - reinforcement-learning
7
  - planning
@@ -11,6 +11,8 @@ tags:
11
  - llada
12
  size_categories:
13
  - 8B
 
 
14
  ---
15
 
16
  # LLaDA-8B-BGPO-countdown
@@ -24,27 +26,61 @@ size_categories:
24
 
25
  ## Model Details
26
 
27
- - **Model Type**: Diffusion Large Language Model (dLLM)
28
- - **Parameters**: 8 billion
29
- - **Training Method**: Boundary-Guided Policy Optimization (BGPO)
30
- - **Base Model**: LLaDA-8B-Instruct
31
- - **Task**: Countdown
32
- - **Language**: English
33
 
34
  ## Training Details
35
 
36
- - **Training Steps**: 560 steps
37
- - **Response Length**: 256 tokens
38
- - **Train Diffusion Steps**: 128
39
- - **Eval Diffusion Steps**: 256
40
- - **Block Size**: 32
41
- - **Monte Carlo Sample Size ($n_t$)**: 16
42
- - **Learning Rate**: 5e-7
43
- - **Batch Size**: 16
44
- - **Framework**: Built on VeRL (Volcengine Reinforcement Learning)
45
 
46
  ## Usage & Limitations
47
 
48
- - Primarily designed for countdown tasks.
49
- - Performance may vary on other tasks.
50
- - Requires appropriate computational resources for inference.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
  tags:
6
  - reinforcement-learning
7
  - planning
 
11
  - llada
12
  size_categories:
13
  - 8B
14
+ pipeline_tag: text-generation
15
+ library_name: transformers
16
  ---
17
 
18
  # LLaDA-8B-BGPO-countdown
 
26
 
27
  ## Model Details
28
 
29
+ - **Model Type**: Diffusion Large Language Model (dLLM)
30
+ - **Parameters**: 8 billion
31
+ - **Training Method**: Boundary-Guided Policy Optimization (BGPO)
32
+ - **Base Model**: LLaDA-8B-Instruct
33
+ - **Task**: Countdown
34
+ - **Language**: English
35
 
36
  ## Training Details
37
 
38
+ - **Training Steps**: 560 steps
39
+ - **Response Length**: 256 tokens
40
+ - **Train Diffusion Steps**: 128
41
+ - **Eval Diffusion Steps**: 256
42
+ - **Block Size**: 32
43
+ - **Monte Carlo Sample Size ($n_t$)**: 16
44
+ - **Learning Rate**: 5e-7
45
+ - **Batch Size**: 16
46
+ - **Framework**: Built on VeRL (Volcengine Reinforcement Learning)
47
 
48
  ## Usage & Limitations
49
 
50
+ - Primarily designed for countdown tasks.
51
+ - Performance may vary on other tasks.
52
+ - Requires appropriate computational resources for inference.
53
+
54
+ ## Performance
55
+
56
+ 1. **Overall Performance**: BGPO vs. baselines on mathematics, coding, and planning tasks
57
+ ![Main Results](https://github.com/THU-KEG/BGPO/raw/main/assets/main_results.png)
58
+
59
+ 2. **Monte Carlo Analysis**: Performance with different sampling sizes $n_t$
60
+ ![MC Results](https://github.com/THU-KEG/BGPO/raw/main/assets/mc_results.png)
61
+
62
+ 3. **Out-of-Domain**: Generalization performance (<span style="color: #939393">gray</span> = in-domain)
63
+ ![OOD Results](https://github.com/THU-KEG/BGPO/raw/main/assets/ood_results.png)
64
+
65
+ ## Acknowledgments
66
+
67
+ We thank the open-source community for their valuable contributions, particularly:
68
+ - [VeRL](https://github.com/volcengine/verl) for the RL framework
69
+ - [HuggingFace](https://huggingface.co/) for model hosting
70
+ - The research community for their feedback and suggestions
71
+
72
+ ## Citation
73
+
74
+ If you find our work useful, please consider citing our paper:
75
+
76
+ ```bibtex
77
+ @misc{lin2025boundaryguidedpolicyoptimizationmemoryefficient,
78
+ title={Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models},
79
+ author={Nianyi Lin and Jiajie Zhang and Lei Hou and Juanzi Li},
80
+ year={2025},
81
+ eprint={2510.11683},
82
+ archivePrefix={arXiv},
83
+ primaryClass={cs.LG},
84
+ url={https://arxiv.org/abs/2510.11683},
85
+ }
86
+ ```