-
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Paper • 2510.03259 • Published • 57 -
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper • 2510.07242 • Published • 30 -
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper • 2510.08308 • Published • 24 -
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75
Collections
Discover the best community collections!
Collections including paper arxiv:2510.11696
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 256 -
MemMamba: Rethinking Memory Patterns in State Space Model
Paper • 2510.03279 • Published • 72 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 48 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 176
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 526 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Paper • 2509.09674 • Published • 80 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 176
-
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 83 -
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper • 2509.06501 • Published • 79 -
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Paper • 2509.02544 • Published • 124 -
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Paper • 2509.02208 • Published • 42
-
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Paper • 2506.19697 • Published • 44 -
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Paper • 2509.23873 • Published • 67 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 79
-
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Paper • 2504.20752 • Published • 92 -
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Paper • 2504.21233 • Published • 49 -
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
Paper • 2211.11363 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 23 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16
-
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning
Paper • 2510.03259 • Published • 57 -
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper • 2510.07242 • Published • 30 -
First Try Matters: Revisiting the Role of Reflection in Reasoning Models
Paper • 2510.08308 • Published • 24 -
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75
-
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Paper • 2509.09674 • Published • 80 -
A Survey of Reinforcement Learning for Large Reasoning Models
Paper • 2509.08827 • Published • 189 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 176
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
Paper • 2509.02479 • Published • 83 -
WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents
Paper • 2509.06501 • Published • 79 -
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning
Paper • 2509.02544 • Published • 124 -
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Paper • 2509.02208 • Published • 42
-
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 256 -
MemMamba: Rethinking Memory Patterns in State Space Model
Paper • 2510.03279 • Published • 72 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 48 -
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
Paper • 2510.11696 • Published • 176
-
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
Paper • 2506.19697 • Published • 44 -
Winning the Pruning Gamble: A Unified Approach to Joint Sample and Token Pruning for Efficient Supervised Fine-Tuning
Paper • 2509.23873 • Published • 67 -
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Paper • 2510.00515 • Published • 39 -
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 79
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 526 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers
Paper • 2504.20752 • Published • 92 -
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Paper • 2504.21233 • Published • 49 -
AF Adapter: Continual Pretraining for Building Chinese Biomedical Language Model
Paper • 2211.11363 • Published • 1 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50
-
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Paper • 2411.11504 • Published • 23 -
Top-nσ: Not All Logits Are You Need
Paper • 2411.07641 • Published • 23 -
Adaptive Decoding via Latent Preference Optimization
Paper • 2411.09661 • Published • 10 -
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training
Paper • 2411.13476 • Published • 16