Models

48

Full-text search

Active filters: verl

orbit-ai/orbit-4b-v0.1

Text Generation • 4B • Updated 6 days ago • 691 • 1

junnyu/Qwen2.5-7B-Instruct-1M-GRPO_logic_KK_5PPL

Text Generation • 8B • Updated Feb 13, 2025 • 2

sonyashijin/qwen3-32b-verilog-lora

Updated Jul 1, 2025 • 13

LichengLiu03/Qwen2.5-3B-UFO

Text Generation • 3B • Updated Jul 23, 2025 • 7 • 2

LichengLiu03/Qwen2.5-3B-UFO-1turn

Text Generation • 3B • Updated Jul 10, 2025 • 5 • 2

mradermacher/Qwen2.5-3B-UFO-GGUF

3B • Updated Jul 4, 2025 • 60 • 1

mradermacher/Qwen2.5-3B-UFO-1turn-GGUF

3B • Updated Jul 4, 2025 • 24 • 1

alphadl/ppo-gsm8k-0.5b

Text Generation • 0.6B • Updated Aug 4, 2025 • 3 • 2

Jasaxion/MathSmith-HC-Problem-Synthesizer-Qwen3-8B

Text Generation • 8B • Updated 29 days ago • 31 • 1

Jasaxion/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B

Text Generation • 8B • Updated 29 days ago • 32 • 1

thejaminator/grpo-feature-vector-step-1

Updated Aug 27, 2025 • 2

MindIntLab/Psyche-R1

Text Generation • 8B • Updated Sep 4, 2025 • 384 • 9

orbit-ai/searchr1-repro-4b

Text Generation • 4B • Updated 6 days ago • 894

orbit-ai/orbit-4b-ablation-top-10-docs-v0.1

Text Generation • 4B • Updated 6 days ago • 600

orbit-ai/orbit-4b-ablation-training-mix-124-v0.1

Text Generation • 4B • Updated 6 days ago • 597

GMagoLi/test-upload

Text Generation • Updated Sep 18, 2025

karthik/verl-qwen2.5-0.5b-gsm8k-ppo-step360

Text Generation • 0.5B • Updated Sep 21, 2025 • 2

samhitha2601/llama3.2-3b-ppo

Reinforcement Learning • Updated Oct 23, 2025

samhitha2601/llama3.2-3b-ppo-critic

Reinforcement Learning • Updated Oct 23, 2025

mradermacher/MathSmith-HC-Problem-Synthesizer-Qwen3-8B-GGUF

8B • Updated 4 days ago • 365 • 1

mradermacher/MathSmith-HC-Problem-Synthesizer-Qwen3-8B-i1-GGUF

8B • Updated 4 days ago • 2.46k • 1

mradermacher/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B-GGUF

8B • Updated 4 days ago • 342 • 1

mradermacher/MathSmith-Hard-Problem-Synthesizer-Qwen3-8B-i1-GGUF

8B • Updated 4 days ago • 2.17k • 1

Time-HD-Anonymous/STReasoner-8B

Feature Extraction • 8B • Updated Jan 7 • 2

archit11/qwen2.5-coder-3b-verl-track-a-lora

Text Generation • Updated Feb 18 • 3

lly0571/Qwen3-4B-2507-FC

orbit-ai/infoseeker-repro-4b

Text Generation • 4B • Updated 6 days ago • 562

mradermacher/infoseeker-repro-4b-GGUF

4B • Updated 28 days ago • 605

mradermacher/orbit-4b-v0.1-GGUF

4B • Updated 28 days ago • 570

mradermacher/infoseeker-repro-4b-i1-GGUF

4B • Updated 28 days ago • 2.73k