ExpSeek: Self-Triggered Experience Seeking for Web Agents Paper • 2601.08605 • Published 3 days ago • 15
ExpSeek: Self-Triggered Experience Seeking for Web Agents Paper • 2601.08605 • Published 3 days ago • 15
Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding Paper • 2512.17220 • Published 28 days ago • 111
Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing Paper • 2409.11726 • Published Sep 18, 2024
SOTOPIA-Ω Checkpoints Collection ACL 2025 (main) paper -- SOTOPIA-Ω: Dynamic Strategy Injection Learning and Social Instruction Following Evaluation for Social Agents. • 3 items • Updated May 27, 2025
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models Paper • 2504.10368 • Published Apr 14, 2025 • 22
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization Paper • 2503.17928 • Published Mar 23, 2025 • 2
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models Paper • 2504.10368 • Published Apr 14, 2025 • 22 • 3
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models Paper • 2504.10368 • Published Apr 14, 2025 • 22