JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Yunlong Lin^1*, Zixu Lin^1*, Kunjie Lin^1*, Jinbin Bai⁵, Panwang Pan⁴, Chenxin Li³, Haoyu Chen², Zhongdao Wang⁶, Xinghao Ding^1†, Wenbo Li^3♣, Shuicheng Yan^5†

¹Xiamen University, ²The Hong Kong University of Science and Technology (Guangzhou), ³ The Chinese University of Hong Kong, ⁴Bytedance, ⁵National University of Singapore, ⁶Tsinghua University

📝 Overview

JarvisArt workflow and results showcase

JarvisArt is a multi-modal large language model (MLLM)-driven agent for intelligent photo retouching. It is designed to liberate human creativity by understanding user intent, mimicking the reasoning of professional artists, and coordinating over 200 tools in Adobe Lightroom. JarvisArt utilizes a novel two-stage training framework, starting with Chain-of-Thought supervised fine-tuning for foundational reasoning, followed by Group Relative Policy Optimization for Retouching (GRPO-R) to enhance its decision-making and tool proficiency. Supported by the newly created MMArt dataset (55K samples) and MMArt-Bench, JarvisArt demonstrates superior performance, outperforming GPT-4o with a 60% improvement in pixel-level metrics for content fidelity while maintaining comparable instruction-following capabilities.

🎬 Demo Videos

Global Retouching Case

Local Retouching Case

JarvisArt supports multi-granularity retouching goals, ranging from scene-level adjustments to region-specific refinements. Users can perform intuitive, free-form edits through natural inputs such as text prompts and bounding boxes

📚 Citation

If you find JarvisArt useful in your research, please consider citing:

@article{jarvisart2025,
title={JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent}, 
      author={Yunlong Lin and Zixu Lin and Kunjie Lin and Jinbin Bai and Panwang Pan and Chenxin Li and Haoyu Chen and Zhongdao Wang and Xinghao Ding and Wenbo Li and Shuicheng Yan},
      year={2025},
      journal={arXiv preprint arXiv:2506.17612}
}

Downloads last month: 10

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for JarvisArt/JarvisArt-1208

Quantizations

2 models