-
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models
Paper • 2504.20157 • Published • 37 -
The Leaderboard Illusion
Paper • 2504.20879 • Published • 72 -
ReasonIR: Training Retrievers for Reasoning Tasks
Paper • 2504.20595 • Published • 53 -
RM-R1: Reward Modeling as Reasoning
Paper • 2505.02387 • Published • 79
W. Joe Weiler
WJoeWeiler
·
AI & ML interests
None yet