Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments Paper • 2603.23638 • Published 9 days ago • 10
Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models Paper • 2603.01571 • Published Mar 2 • 33
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published Mar 2 • 64
Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation Paper • 2602.16990 • Published Feb 19 • 11