Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2601.19895

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published 22 days ago • 33
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 21
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Paper • 2602.00747 • Published 15 days ago • 9

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

Paper • 2602.07845 • Published 7 days ago • 67

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Towards Automated Kernel Generation in the Era of LLMs

Paper • 2601.15727 • Published 24 days ago • 18
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published 25 days ago • 74
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 150
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 71
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23

Towards Pixel-Level VLM Perception via Simple Points Prediction

Paper • 2601.19228 • Published 19 days ago • 17
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published 19 days ago • 42
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Paper • 2601.21639 • Published 17 days ago • 49

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published 25 days ago • 20
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published 24 days ago • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published 24 days ago • 51
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published about 1 month ago • 30

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Paper • 2601.10527 • Published about 1 month ago • 24
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

Paper • 2601.10657 • Published about 1 month ago • 20
TranslateGemma Technical Report

Paper • 2601.09012 • Published Jan 13 • 19
Recursive Language Models

Paper • 2512.24601 • Published Dec 31, 2025 • 86

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers

Paper • 2601.17367 • Published 22 days ago • 33
Small-scale proxies for large-scale Transformer training instabilities

Paper • 2309.14322 • Published Sep 25, 2023 • 21
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training

Paper • 2602.00747 • Published 15 days ago • 9

Towards Pixel-Level VLM Perception via Simple Points Prediction

Paper • 2601.19228 • Published 19 days ago • 17
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published 19 days ago • 42
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Paper • 2601.21639 • Published 17 days ago • 49

Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning

Paper • 2602.07845 • Published 7 days ago • 67

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published 25 days ago • 20
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published 24 days ago • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published 24 days ago • 51
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published about 1 month ago • 30

MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Towards Automated Kernel Generation in the Era of LLMs

Paper • 2601.15727 • Published 24 days ago • 18
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

Paper • 2601.14724 • Published 25 days ago • 74
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23

A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

Paper • 2601.10527 • Published about 1 month ago • 24
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

Paper • 2601.10657 • Published about 1 month ago • 20
TranslateGemma Technical Report

Paper • 2601.09012 • Published Jan 13 • 19
Recursive Language Models

Paper • 2512.24601 • Published Dec 31, 2025 • 86

Stuff I'm going to read

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 150
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
Motion Attribution for Video Generation

Paper • 2601.08828 • Published Jan 13 • 71
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published 19 days ago • 23

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs