AzalKhan
/

Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_1176_FlashRL_G4-L2048_new

Reinforcement Learning

text-generation

text-generation-inference

Model card Files Files and versions

AzalKhan commited on Oct 24

Commit

a611ce3

·

verified ·

1 Parent(s): c666385

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +25 -0

README.md ADDED Viewed

	@@ -0,0 +1,25 @@

+---
+license: apache-2.0
+base_model: Qwen/Qwen2.5-1.5B-Instruct
+datasets:
+  - open-r1/DAPO-Math-17k-Processed
+library_name: transformers
+tags:
+  - grpo
+  - reinforcement-learning
+  - reasoning
+  - qwen
+---
+# Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_1176_FlashRL_G4-L2048_new
+This repository contains a checkpoint trained with GRPO on `open-r1/DAPO-Math-17k-Processed` starting from `Qwen/Qwen2.5-1.5B-Instruct`.\
+This snapshot corresponds to training step `1176`.
+Contents include:
+- Model weights (`.safetensors`)
+- Config files (`config.json`, `generation_config.json`)
+- Tokenizer files (`tokenizer.json`, `tokenizer_config.json`, `vocab.json`, `merges.txt`, `special_tokens_map.json`, `added_tokens.json`)
+- Optional chat template (`chat_template.jinja`)
+Training artifacts (optimizer/scheduler states and RNG) have been intentionally excluded.