Collections
Discover the best community collections!
Collections including paper arxiv:2512.04324
-
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 159 -
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Paper • 2512.04324 • Published • 146 -
Agent Laboratory: Using LLM Agents as Research Assistants
Paper • 2501.04227 • Published • 95 -
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
Paper • 2507.06229 • Published • 75
-
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper • 2502.07445 • Published • 11 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 8 -
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Paper • 2502.03032 • Published • 60 -
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 40
-
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Paper • 2405.20541 • Published • 24 -
RedPajama: an Open Dataset for Training Large Language Models
Paper • 2411.12372 • Published • 56 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45 -
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Paper • 2512.04324 • Published • 146
-
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Paper • 2505.13227 • Published • 45 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 2.08k • 543 -
nvidia/OpenMathReasoning
Viewer • Updated • 5.68M • 15.6k • 368 -
Search Arena: Analyzing Search-Augmented LLMs
Paper • 2506.05334 • Published • 17
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 192 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
-
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 159 -
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Paper • 2512.04324 • Published • 146 -
Agent Laboratory: Using LLM Agents as Research Assistants
Paper • 2501.04227 • Published • 95 -
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
Paper • 2507.06229 • Published • 75
-
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Paper • 2505.13227 • Published • 45 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 2.08k • 543 -
nvidia/OpenMathReasoning
Viewer • Updated • 5.68M • 15.6k • 368 -
Search Arena: Analyzing Search-Augmented LLMs
Paper • 2506.05334 • Published • 17
-
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper • 2502.07445 • Published • 11 -
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning
Paper • 2502.04689 • Published • 8 -
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models
Paper • 2502.03032 • Published • 60 -
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper • 2502.01534 • Published • 40
-
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 68 -
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning
Paper • 2502.06060 • Published • 38 -
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Paper • 2502.14499 • Published • 192 -
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100
-
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Paper • 2405.20541 • Published • 24 -
RedPajama: an Open Dataset for Training Large Language Models
Paper • 2411.12372 • Published • 56 -
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback
Paper • 2503.22230 • Published • 45 -
DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle
Paper • 2512.04324 • Published • 146