Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 13 days ago • 184
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published 14 days ago • 98
tt1225/20260424_VR_H31_bodyshop_place_part2_pose36_nav_stereo_rvt_speedup1 Viewer • Updated 25 days ago • 180 • 300 • 1
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 28 days ago • 240
GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents Paper • 2604.07429 • Published Apr 8 • 121
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation Paper • 2604.10030 • Published Apr 11 • 15
Less Detail, Better Answers: Degradation-Driven Prompting for VQA Paper • 2604.04838 • Published Apr 6 • 13
MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping Paper • 2604.08364 • Published Apr 9 • 101