📚 LLM pretraining datasets Collection A collection of datasets for LLM pretraining • 9 items • Updated May 5, 2025 • 18
Running on CPU Upgrade Featured 3.12k The Smol Training Playbook 📚 3.12k The secrets to building world-class LLMs
Running 97 Unlocking On-Policy Distillation for Any Model Family 📝 97 Visualize on-policy distillation for any model family
Running on CPU Upgrade 14k Open LLM Leaderboard 🏆 14k Track, rank and evaluate open LLMs and chatbots
[lecture artifacts] aligning open language models Collection artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin • 63 items • Updated Apr 17, 2024 • 58
Running Agents Featured 253 Jupyter Agent 2 🏃 253 Generate Jupyter notebooks from natural language tasks
laion/CLIP-ViT-B-32-laion2B-s34B-b79K Zero-Shot Image Classification • 0.2B • Updated Jan 22, 2025 • 2.72M • 139