Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 7 days ago • 78
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand 4 days ago • 48
Go-Explore: a New Approach for Hard-Exploration Problems Paper • 1901.10995 • Published Jan 30, 2019 • 1
KTO: Model Alignment as Prospect Theoretic Optimization Paper • 2402.01306 • Published Feb 2, 2024 • 20