Native Multimodal Models are World Learners 🌍
AI & ML interests
None defined yet.
Recent Activity
Papers
General Agentic Memory Via Deep Research
Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
Efficient MLLM for Long Video Understanding.
A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models
open-source community driven next generation of AI models
Emu3: Next-Token Prediction is All You Need
Chinese Corpora Internet(中文互联网语料)
Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
-
BAAI/Infinity-MM
Updated • 4.84k • 113 -
BAAI/Aquila-VL-2B-llava-qwen
Visual Question Answering • 2B • Updated • 327 • 61 -
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Paper • 2410.18558 • Published • 19 -
BAAI/Aquila-VL-2B-Intermediate
Image-Text-to-Text • Updated • 2
Alt
-
BAAI/AltCLIP
Zero-Shot Image Classification • Updated • 5.15k • 31 -
BAAI/AltCLIP-m18
Zero-Shot Image Classification • Updated • 456 • 5 -
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Paper • 2211.06679 • Published • 2 -
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Paper • 2308.09991 • Published • 3
多语种多行业预训练数据集
URSA: Uniform Discrete Diffusion with Metric Path for Video Generation
RoboBrain 2.0: See Better. Think Harder. Do Smarter.
Scaling Instruction Selection and Synthesis to Enhance Language Models
-
BAAI/Infinity-Instruct
Viewer • Updated • 21.9M • 3.85k • 686 -
BAAI/Gemma2-9B-IT-Simpo-Infinity-Preference
9B • Updated • 90 • 17 -
BAAI/Infinity-Instruct-7M-Gen-Llama3_1-70B
Text Generation • 71B • Updated • 1.14k • • 19 -
BAAI/Infinity-Instruct-3M-0625-Yi-1.5-9B
Text Generation • 9B • Updated • 7.95k • 3
NOVA: Autoregressive Video Generation without Vector Quantization
多语种多行业指令数据集
Native Multimodal Models are World Learners 🌍
URSA: Uniform Discrete Diffusion with Metric Path for Video Generation
RoboBrain 2.0: See Better. Think Harder. Do Smarter.
Efficient MLLM for Long Video Understanding.
A Bilingual Pretraining Dataset for Enhancing Reasoning in Large Language Models
open-source community driven next generation of AI models
Emu3: Next-Token Prediction is All You Need
Chinese Corpora Internet(中文互联网语料)
Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
-
BAAI/Infinity-MM
Updated • 4.84k • 113 -
BAAI/Aquila-VL-2B-llava-qwen
Visual Question Answering • 2B • Updated • 327 • 61 -
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Paper • 2410.18558 • Published • 19 -
BAAI/Aquila-VL-2B-Intermediate
Image-Text-to-Text • Updated • 2
Scaling Instruction Selection and Synthesis to Enhance Language Models
-
BAAI/Infinity-Instruct
Viewer • Updated • 21.9M • 3.85k • 686 -
BAAI/Gemma2-9B-IT-Simpo-Infinity-Preference
9B • Updated • 90 • 17 -
BAAI/Infinity-Instruct-7M-Gen-Llama3_1-70B
Text Generation • 71B • Updated • 1.14k • • 19 -
BAAI/Infinity-Instruct-3M-0625-Yi-1.5-9B
Text Generation • 9B • Updated • 7.95k • 3
NOVA: Autoregressive Video Generation without Vector Quantization
Alt
-
BAAI/AltCLIP
Zero-Shot Image Classification • Updated • 5.15k • 31 -
BAAI/AltCLIP-m18
Zero-Shot Image Classification • Updated • 456 • 5 -
AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities
Paper • 2211.06679 • Published • 2 -
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Paper • 2308.09991 • Published • 3
多语种多行业指令数据集
多语种多行业预训练数据集