Papers
arxiv:2604.19859

DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data

Published on Apr 21
· Submitted by
Sunhao Dai
on Apr 23
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,

Abstract

DR-Venus-4B is a 4-billion-parameter deep research agent trained entirely on open data using agentic supervised fine-tuning and reinforcement learning with turn-level rewards to achieve superior performance on research benchmarks while maintaining edge-scale deployment advantages.

AI-generated summary

Edge-scale deep research agents based on small language models are attractive for real-world deployment due to their advantages in cost, latency, and privacy. In this work, we study how to train a strong small deep research agent under limited open-data by improving both data quality and data utilization. We present DR-Venus, a frontier 4B deep research agent for edge-scale deployment, built entirely on open data. Our training recipe consists of two stages. In the first stage, we use agentic supervised fine-tuning (SFT) to establish basic agentic capability, combining strict data cleaning with resampling of long-horizon trajectories to improve data quality and utilization. In the second stage, we apply agentic reinforcement learning (RL) to further improve execution reliability on long-horizon deep research tasks. To make RL effective for small agents in this setting, we build on IGPO and design turn-level rewards based on information gain and format-aware regularization, thereby enhancing supervision density and turn-level credit assignment. Built entirely on roughly 10K open-data, DR-Venus-4B significantly outperforms prior agentic models under 9B parameters on multiple deep research benchmarks, while also narrowing the gap to much larger 30B-class systems. Our further analysis shows that 4B agents already possess surprisingly strong performance potential, highlighting both the deployment promise of small models and the value of test-time scaling in this setting. We release our models, code, and key recipes to support reproducible research on edge-scale deep research agents.

Community

Key insights:

  1. We explore how to build strong edge-scale deep research agents with small language models under limited open-data settings, focusing on both data quality and data utilization.

  2. We introduce DR-Venus, a 4B deep research agent trained entirely on roughly 10K open-data. The training recipe combines agentic supervised fine-tuning with strict data cleaning and long-horizon trajectory resampling, followed by agentic reinforcement learning to improve reliability on complex research tasks.

  3. To make RL more effective for small agents, we design turn-level rewards based on information gain and format-aware regularization, improving supervision density and credit assignment across multi-step agent execution.

  4. Experiments show that DR-Venus-4B-RL establishes a new frontier among small deep research agents and consistently outperforms prior agentic systems at similar scales. Despite its compact 4B size, DR-Venus substantially narrows the gap to much larger 30B-class agents. Pass@K analysis further reveals that the capability ceiling of small deep research agents is surprisingly high, suggesting that test-time scaling can be an especially effective way to unlock the potential of edge-scale reasoning models.

GitHub: https://github.com/inclusionAI/DR-Venus
SFT code: https://github.com/inclusionAI/DR-Venus/tree/master/SFT
RL code: https://github.com/inclusionAI/DR-Venus/tree/master/RL
Inference code: https://github.com/inclusionAI/DR-Venus/tree/master/Inference
SFT model: https://huggingface.co/inclusionAI/DR-Venus-4B-SFT
RL model: https://huggingface.co/inclusionAI/DR-Venus-4B-RL
Collection: https://huggingface.co/collections/inclusionAI/dr-venus

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.19859
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.19859 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.19859 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.19859 in a Space README.md to link it from this page.

Collections including this paper 1