GRACE: Generative Representation Learning via Contrastive Policy Optimization Paper • 2510.04506 • Published Oct 6, 2025 • 10
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window Paper • 2510.08276 • Published Oct 9, 2025 • 9
From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models Paper • 2503.06260 • Published Mar 8, 2025
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2, 2025 • 187