oguzhanercan 's Collections Finetuning Strategies
updated
MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge
Paper
• 2507.21183
• Published
• 15
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Paper
• 2507.21802
• Published
• 19
EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for
Advantage Diversity
Paper
• 2507.21848
• Published
• 9
Agentic Reinforced Policy Optimization
Paper
• 2507.19849
• Published
• 158
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published
• 160
DCPO: Dynamic Clipping Policy Optimization
Paper
• 2509.02333
• Published
• 22
Towards a Unified View of Large Language Model Post-Training
Paper
• 2509.04419
• Published
• 76
Learning to Optimize Multi-Objective Alignment Through Dynamic Reward
Weighting
Paper
• 2509.11452
• Published
• 14
Reinforcement Learning on Pre-Training Data
Paper
• 2509.19249
• Published
• 67
Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense
Paper
• 2510.07242
• Published
• 30
Reinforcing Diffusion Models by Direct Group Preference Optimization
Paper
• 2510.08425
• Published
• 12
Free Lunch Alignment of Text-to-Image Diffusion Models without
Preference Image Pairs
Paper
• 2509.25771
• Published
• 11
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts
LLMs
Paper
• 2511.07419
• Published
• 27
Video Generation Models Are Good Latent Reward Models
Paper
• 2511.21541
• Published
• 45
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs
Paper
• 2601.08763
• Published
• 148
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting
Paper
• 2601.02151
• Published
• 109