KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding Paper • 2503.02951 • Published Mar 4 • 33
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities Paper • 2502.12025 • Published Feb 17 • 3
TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Paper • 2505.14625 • Published May 20 • 13
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL Paper • 2505.23977 • Published May 29 • 10
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding Paper • 2402.08983 • Published Feb 14, 2024 • 5
Stronger Models are NOT Stronger Teachers for Instruction Tuning Paper • 2411.07133 • Published Nov 11, 2024 • 38