GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 3 days ago • 134
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation Paper • 2601.02204 • Published 6 days ago • 56
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 5 days ago • 94
KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs Paper • 2601.01046 • Published 9 days ago • 11
InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams Paper • 2601.02281 • Published 6 days ago • 29
Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes Paper • 2601.02356 • Published 6 days ago • 13
EVLP:Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning Paper • 2511.05553 • Published Nov 3, 2025 • 1
LightBagel: A Light-weighted, Double Fusion Framework for Unified Multimodal Understanding and Generation Paper • 2510.22946 • Published Oct 27, 2025 • 17
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published Oct 8, 2025 • 74
UniVideo: Unified Understanding, Generation, and Editing for Videos Paper • 2510.08377 • Published Oct 9, 2025 • 78
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published Dec 1, 2025 • 72
ConsistCompose: Unified Multimodal Layout Control for Image Composition Paper • 2511.18333 • Published Nov 23, 2025 • 2
UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space Paper • 2511.15046 • Published Nov 19, 2025 • 1
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation Paper • 2511.20563 • Published Nov 25, 2025 • 1
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision Paper • 2601.03193 • Published 5 days ago • 42
Plan-X: Instruct Video Generation via Semantic Planning Paper • 2511.17986 • Published Nov 22, 2025 • 17
Mask2IV: Interaction-Centric Video Generation via Mask Trajectories Paper • 2510.03135 • Published Oct 3, 2025 • 1
BindWeave: Subject-Consistent Video Generation via Cross-Modal Integration Paper • 2510.00438 • Published Oct 1, 2025 • 9