Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.13254

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published 20 days ago • 134
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Paper • 2511.14582 • Published 19 days ago • 17
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

Paper • 2511.15248 • Published 19 days ago • 6

Model Arithmetic

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published 20 days ago • 134

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

Paper • 2507.04404 • Published Jul 6 • 21
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15 • 31
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

Paper • 2505.12781 • Published May 19 • 2
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259

Training optimization

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9 • 40
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published Jun 26 • 18

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

Paper • 2412.04144 • Published Dec 5, 2024 • 6
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation

Paper • 2410.08371 • Published Oct 10, 2024 • 3
MERGE^3: Efficient Evolutionary Merging on Consumer-grade GPUs

Paper • 2502.10436 • Published Feb 9 • 1
Mergenetic: a Simple Evolutionary Model Merging Library

Paper • 2505.11427 • Published May 16 • 14

Reinforcement Learning

Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13 • 31
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Paper • 2510.19363 • Published Oct 22 • 61
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29 • 44
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Paper • 2511.07384 • Published 27 days ago • 16

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.7k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 140 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 62
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 53

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published 20 days ago • 134
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

Paper • 2511.14582 • Published 19 days ago • 17
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control

Paper • 2511.15248 • Published 19 days ago • 6

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

Paper • 2412.04144 • Published Dec 5, 2024 • 6
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation

Paper • 2410.08371 • Published Oct 10, 2024 • 3
MERGE^3: Efficient Evolutionary Merging on Consumer-grade GPUs

Paper • 2502.10436 • Published Feb 9 • 1
Mergenetic: a Simple Evolutionary Model Merging Library

Paper • 2505.11427 • Published May 16 • 14

Model Arithmetic

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published 20 days ago • 134

Reinforcement Learning

Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13 • 31
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Paper • 2510.19363 • Published Oct 22 • 61
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29 • 44
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence

Paper • 2511.07384 • Published 27 days ago • 16

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

Paper • 2507.04404 • Published Jul 6 • 21
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15 • 31
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone

Paper • 2505.12781 • Published May 19 • 2
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17 • 259

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.7k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 140 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

Training optimization

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published Feb 9 • 40
Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 171
Parallel Scaling Law for Language Models

Paper • 2505.10475 • Published May 15 • 83
Learning to Skip the Middle Layers of Transformers

Paper • 2506.21103 • Published Jun 26 • 18

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 62
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 53

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs