A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 259
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper • 2507.06261 • Published Jul 7 • 64
Vid2World: Crafting Video Diffusion Models to Interactive World Models Paper • 2505.14357 • Published May 20 • 27
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Paper • 2505.04512 • Published May 7 • 36
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 66
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control Paper • 2501.01427 • Published Jan 2 • 54
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper • 2412.15322 • Published Dec 19, 2024 • 20
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers Paper • 2412.09611 • Published Dec 12, 2024 • 11
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models Paper • 2412.07674 • Published Dec 10, 2024 • 20
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation Paper • 2412.05148 • Published Dec 6, 2024 • 12
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Paper • 2412.07774 • Published Dec 10, 2024 • 30
Mind the Time: Temporally-Controlled Multi-Event Video Generation Paper • 2412.05263 • Published Dec 6, 2024 • 11
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Aug 25 • 82