-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 14 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:1706.03762
-
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 491 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 14 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 8
-
Attention Is All You Need
Paper • 1706.03762 • Published • 104 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 54 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11