Lewis Tunstall
@_lewtun
Followers
18K
Following
11K
Media
968
Statuses
5K
🤠post-training @huggingface
Berne, Switzerland
Joined August 2018
We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧠Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,
20
85
463
We’re releasing pre-anneal checkpoints for our Nano/Mini base models. Still plenty of math + code exposure, but easier to CPT and customize than our post-anneal checkpoints. Have fun exploring.
10
22
135
Big vision? Big startup? Big domain. đź’Ą The right name: Sparks excitement Draws customers in Powers your growth Find the domain that fuels your idea.
2
4
20
Sharing the slides from a talk I gave this week on bridging the gap between research experiments and building production-ready models, based on our recent Smol Training Playbook. https://t.co/RmG53PytMv
docs.google.com
Training World-Class LLMs: From Research to Production Hugging Face Loubna Ben Allal
10
69
526
HUGE notebook for doing reinforcement learning on agents with TRL 0.26.0 and GRPO. In this notebook: - set up an internal knowledge base agent with postgres for managing hotel bookings, and define python function tools. - I defined a simple dataset for use prompts like "book a
3
32
273
Much like Woj — the man who brought him into the game — @ShamsCharania is the guy you look to verify the story when you see it breaking on social media. If it doesn’t come directly from him, you rush to his accounts to see if what you just read is real or fake news. Every
7
23
184
I'm looking to hire someone very strong in post-training with solid OSS experience. If you think you’d be a good fit, DM me.
9
7
65
My new blog post discusses the physical reality of computation and why this means we will not see AGI or any meaningful superintelligence:
timdettmers.com
If you are reading this, you probably have strong opinions about AGI, superintelligence, and the future of AI. Maybe you believe we are on the cusp of a transformative breakthrough. Maybe you are...
164
165
1K
We just released TRL v0.26.0! It comes packed with updates: > Agent training with tools in GRPO > New CISPO & SAPO losses > Reasoning rewards > vLLM quantization in colocate mode > Dataset shuffling in SFT > Lots of NEW examples > Tons of fixes and documentation improvements
6
19
141
if the agents are so good why are you mfs in the office til 9 still
158
278
8K
Today we open source Nomos 1. At just 30B parameters, it scores 87/120 on this year’s Putnam, one of the world’s most prestigious math competitions. This score would rank #2/3988 in 2024 and marks our first step with @hillclimbai towards creating a SOTA AI mathematician.
80
230
2K
1/5 🚀Apriel-1.6-15B-Thinker: a 15B multimodal reasoner scoring 57 on the Artificial Analysis Intelligence Index - approaching the performance of ~200B-scale frontier models while remaining an order of magnitude smaller. 🧠Model weights: https://t.co/GE22SOIBfT 📄Blog:
9
53
213
OpenAI leadership (@gdb, @markchen90) are promoting a paper in Physics Letters B where GPT-5 proposed the main idea — possibly the first peer-reviewed paper where an LLM generated the core contribution. One small problem: GPT-5's idea tests the wrong thing. 1/
17
64
478
We used Claude Code to train open LLMs. Check out the tutorial. basically, we plugged HF skills into claude code and it was able to train LLMs end-to-end. Best thing, this works on all major coding agents: Codex, Cursor, and Gemini CLI. - You tell the agent to fine-tune a model
19
110
795
The HuggingFace team just got Claude Code to fully train an open LLM. You just say something like: “Fine-tune Qwen3-0.6B on open-r1/codeforces-cots.” Claude handles the rest. ▸ Picks the best cloud GPU based on model size ▸ Loads dataset (or searches if not specified) ▸
42
120
997
Putnam, the world's hardest college-level math test, ended yesterday 4p PT. Noon today, AxiomProver solved 9/12 problems in Lean autonomously (3:58p PT yesterday, it was 8/12). Our score would've been #1 of ~4000 participants last year and Putnam Fellow (top 5) in recent years
38
108
927
The obvious answer is open source AI 🤗
Pope Leo is right: "How can we ensure that the development of artificial intelligence truly serves the common good, and is not just used to accumulate wealth and power in the hands of a few?” We must demand that the benefits of this technology work for all, not just the rich.
0
1
15
Also a nice reminder to start your post-training experiments with SFT before reaching for RL :) https://t.co/Sa1kkYqLlj
0
2
7
Nice easter egg in the Rnj-1 release: post-training was just SFT! This makes the SWE-Bench scores even more impressive, since that's a benchmark where recovering from errors is crucial and typically hard to instil with SFT alone
@HannaHajishirzi Thank you, @HannaHajishirzi and congratulations to you and your team on Olmo3. Rnj-1-Instruction was only SFT'd making Olmo-SFT the appropriate comparison. We didn't use DPO or RL.
4
9
129
Today, we’re excited to introduce Rnj-1, @essential_ai's first open model; a world-class 8B base + instruct pair, built with scientific rigor, intentional design, and a belief that the advancement and equitable distribution of AI depend on building in the open. We bring
37
153
1K