MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published 6 days ago • 173
view article Article NEO-unify: Building Native Multimodal Unified Models End to End 17 days ago • 102
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 16 days ago • 114
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models Paper • 2602.20309 • Published 27 days ago • 16
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 188
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published Nov 20, 2025 • 94
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published Nov 21, 2025 • 28
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published Nov 14, 2025 • 193
PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published Oct 27, 2025 • 25
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published Oct 13, 2025 • 106
High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting Paper • 2510.10637 • Published Oct 12, 2025 • 15
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios Paper • 2505.12891 • Published May 19, 2025 • 10