AI & ML interests

None defined yet.

Recent Activity

nouamanetaziĀ 
posted an update about 1 month ago
view post
Post
3949
After training š’š¦šØš„š‹šŒšŸ‘ on šŸ‘šŸ–šŸ’ š‡šŸšŸŽšŸŽš¬ for nearly a month, I've come to realize something most people overlook: š¢š§šŸš«ššš¬š­š«š®šœš­š®š«šž š¢š¬ š­š”šž š¦ššš¤šž-šØš«-š›š«šžššš¤ šŸšššœš­šØš« š¢š§ š‹š‹šŒ š­š«ššš¢š§š¢š§š . šŸ”„

Everyone talks about model architecture and data quality. And yes, those matter immensely. But here's what nobody tells you: when your training run fails at 2 AM because of mysterious šš‚š‚š‹ šžš«š«šØš«š¬, or when your expensive GPU cluster is running at šŸ”šŸŽ% šžšŸšŸš¢šœš¢šžš§šœš², the problem isn't your model. It's most probably a š¦š¢š¬š®š¬šž šØšŸ š­š”šž š”ššš«šš°ššš«šž. šŸ› ļø

Questions that seemed simple but had no clear answers: Why is šŒšØš„ š­š«ššš¢š§š¢š§š  š¬š„šØš°šžš« š­š”ššš§ ššžš§š¬šž š¦šØššžš„š¬? Which šš‚š‚š‹ šŸš„ššš š¬ should we actually set? How often should we checkpoint without killing throughput?

That's why we built š“š”šž š’š¦šØš„ š“š«ššš¢š§š¢š§š  šš„ššš²š›šØšØš¤ šŸ“–: a complete guide covering everything from model architecture and data curation to the SmolLM3 training marathon, post-training techniques, and crucially, the š¢š§šŸš«ššš¬š­š«š®šœš­š®š«šž š„ššš²šžš« that most teams get wrong.

We validated real vs theoretical bandwidth across the entire stack: š‡ššŒšŸ‘ š”š¢š­š­š¢š§š  šŸ‘ š“š/š¬, šš•š‹š¢š§š¤ šŸ’.šŸŽ š«šžšššœš”š¢š§š  šŸ•šŸ–šŸ” š†š/š¬, šš‚šˆšž š†šžš§šŸ’ ššš­ šŸšŸ’.šŸ š†š/š¬. Then we ran collective operations across šŸšŸšŸ– š†šš”š¬ (16 nodes, 8xH100s each) and measured how performance degrades at scale: all-reduce drops from šŸ’šŸ–šŸŽ š†š/š¬ on a single node to šŸ‘šŸšŸŽ-šŸ‘šŸ“šŸŽ š†š/š¬ across 16 nodes.

If you've ever wondered why your training runs are slower than they should be, or you're planning to scale up and want to avoid expensive mistakes, this guide might save you weeks of debugging.

š“š”šž š’š¦šØš„ š“š«ššš¢š§š¢š§š  šš„ššš²š›šØšØš¤: https://lnkd.in/e5MKXUHS

Shared with ā¤ļø by the HuggingFace team
yjerniteĀ 
posted an update 3 months ago
view post
Post
2556
Tremendous quality of life upgrade on the Hugging Face Hub - we now have auto-complete emojis šŸ¤— 🄳 šŸ‘ šŸ™Œ šŸŽ‰

Get ready for lots more very serious analysis on a whole range of topics from yours truly now that we have unlocked this full range of expression šŸ˜„ šŸ¤” šŸ—£ šŸ™Š
yjerniteĀ 
posted an update 4 months ago
view post
Post
4203
š—™š—¶š—æš˜€š˜ š—šš—£š—”š—œ š— š—¼š—±š—²š—¹ š˜„š—¶š˜š—µ š—˜š—Ø š——š—®š˜š—® š—§š—æš—®š—»š˜€š—½š—®š—æš—²š—»š—°š˜† š—§š—²š—ŗš—½š—¹š—®š˜š—²? šŸ‡ŖšŸ‡ŗ

With the release of the EU data transparency template this week, we finally got to see one of the most meaningful artifacts to come out of the AI Act implementation so far (haven't you heard? AI's all about the data! šŸ“ŠšŸ“š)

The impact of the template will depend on how effectively it establishes a minimum meaningful transparency standard for companies that don't otherwise offer any transparency into their handling of e.g. personal data or (anti?-)competitive practices in commercial licensing - we'll see how those play out as new models are released after August 2nd šŸ‘€


In the meantime, I wanted to see how the template works for a fully open-source + commercially viable model, so I filled it out for the SmolLM3 - which my colleagues at Hugging Face earlier this month šŸ¤— ICYMI, it's fully open-source with 3B parameters and performance matching the best similar-size models (I've switched all my local apps from Qwen3 to it, you should too šŸ’”)

Verdict: congrats to the European Commission AI Office for making it so straightforward! Fully open and transparent models remain a cornerstone of informed regulation and governance, but the different organizational needs of their developers aren't always properly accounted for in new regulation. In this case, it took me all of two hours to fill out and publish the template (including reading the guidelines) - so kudos for making it feasible for smaller and distributed organizations šŸ™Œ Definitely a step forward for transparency šŸ”

To learn more have a look at:

- The SmolLM3 model: HuggingFaceTB/SmolLM3-3B
- Its filled out Public Summary of Training Content: hfmlsoc/smollm3-eu-data-transparency
- And if you're interested, some previous remarks on regulatory minimum meaningful standards for data disclosure: https://huggingface.co/blog/yjernite/naiac-data-transparency
yjerniteĀ 
posted an update 6 months ago