SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization Paper • 2511.06411 • Published Nov 9, 2025 • 17
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization Paper • 2511.06411 • Published Nov 9, 2025 • 17 • 2
SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization Paper • 2511.06411 • Published Nov 9, 2025 • 17
Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design Paper • 2501.08603 • Published Jan 15, 2025
UDC: A Unified Neural Divide-and-Conquer Framework for Large-Scale Combinatorial Optimization Problems Paper • 2407.00312 • Published Jun 29, 2024
Reasoning-CV: Fine-tuning Powerful Reasoning LLMs for Knowledge-Assisted Claim Verification Paper • 2505.12348 • Published May 18, 2025