-
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23 -
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
Paper • 2601.17367 • Published • 33 -
Small-scale proxies for large-scale Transformer training instabilities
Paper • 2309.14322 • Published • 21 -
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training
Paper • 2602.00747 • Published • 9
Collections
Discover the best community collections!
Collections including paper arxiv:2601.19895
-
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Towards Automated Kernel Generation in the Era of LLMs
Paper • 2601.15727 • Published • 18 -
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper • 2601.14724 • Published • 74 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 150 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 17 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 42 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 49
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 20 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 51 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
Paper • 2601.10527 • Published • 24 -
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution
Paper • 2601.10657 • Published • 20 -
TranslateGemma Technical Report
Paper • 2601.09012 • Published • 19 -
Recursive Language Models
Paper • 2512.24601 • Published • 86
-
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23 -
Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers
Paper • 2601.17367 • Published • 33 -
Small-scale proxies for large-scale Transformer training instabilities
Paper • 2309.14322 • Published • 21 -
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training
Paper • 2602.00747 • Published • 9
-
Towards Pixel-Level VLM Perception via Simple Points Prediction
Paper • 2601.19228 • Published • 17 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23 -
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision
Paper • 2601.19798 • Published • 42 -
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
Paper • 2601.21639 • Published • 49
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 20 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 51 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Towards Automated Kernel Generation in the Era of LLMs
Paper • 2601.15727 • Published • 18 -
HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding
Paper • 2601.14724 • Published • 74 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23
-
A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5
Paper • 2601.10527 • Published • 24 -
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution
Paper • 2601.10657 • Published • 20 -
TranslateGemma Technical Report
Paper • 2601.09012 • Published • 19 -
Recursive Language Models
Paper • 2512.24601 • Published • 86
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 150 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 71 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 23