HorizonMath: Measuring AI Progress Toward Mathematical Discovery with Automatic Verification Paper • 2603.15617 • Published 5 days ago • 6
The PokeAgent Challenge: Competitive and Long-Context Learning at Scale Paper • 2603.15563 • Published 5 days ago • 10
Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning Paper • 2603.15611 • Published 5 days ago • 10
MMOU: A Massive Multi-Task Omni Understanding and Reasoning Benchmark for Long and Complex Real-World Videos Paper • 2603.14145 • Published 7 days ago • 13
EvoClaw: Evaluating AI Agents on Continuous Software Evolution Paper • 2603.13428 • Published 9 days ago • 19
Safe and Scalable Web Agent Learning via Recreated Websites Paper • 2603.10505 • Published 11 days ago • 23
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer Paper • 2603.15478 • Published 5 days ago • 24
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published 5 days ago • 139
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data Paper • 2603.15594 • Published 5 days ago • 139
CreativeBench: Benchmarking and Enhancing Machine Creativity via Self-Evolving Challenges Paper • 2603.11863 • Published 9 days ago • 6
Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation Paper • 2603.13131 • Published 8 days ago • 6
Think While Watching: Online Streaming Segment-Level Memory for Multi-Turn Video Reasoning in Multimodal Large Language Models Paper • 2603.11896 • Published 9 days ago • 8
EvoScientist: Towards Multi-Agent Evolving AI Scientists for End-to-End Scientific Discovery Paper • 2603.08127 • Published 13 days ago • 14
Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously Paper • 2603.12262 • Published 9 days ago • 30