Edward Beeching Profile
Edward Beeching

@edwardbeeching

Followers
2K
Following
611
Media
57
Statuses
228

Research Scientist @HuggingFace. PhD in Deep RL approaches for Robotic Navigation @INRIA.

Lyon, France
Joined July 2010
Don't wanna be here? Send us removal request.
@edwardbeeching
Edward Beeching
16 days
This opens up more flexible and powerful student-teacher pairings. You can now pick the best teacher and best student for the job, without worrying if their tokenizers match. You can read the full technical write-up and find the open-source code here:
huggingface.co
0
0
0
@edwardbeeching
Edward Beeching
16 days
We also benchmarked GOLD against GRPO. Even in this difficult cross-tokenizer scenario, GOLD still outperformed GRPO by 20%
1
0
0
@edwardbeeching
Edward Beeching
16 days
The results on math reasoning tasks are very promising. In a cross-tokenizer setup, GOLD recovered 60% of the teacher's performance. The old ULD baseline only managed 10%.
1
0
0
@edwardbeeching
Edward Beeching
16 days
How does it work? GOLD makes two key improvements over previous methods like ULD. First, a better sequence alignment that merges tokens by summing log probabilities. Second, a hybrid vocabulary alignment that finds 1-to-1 token mappings where possible.
1
0
0
@edwardbeeching
Edward Beeching
16 days
We were already working on a way to solve this. We're introducing General On-Policy Logit Distillation (GOLD), a new method that extends on-policy distillation to the cross-tokenizer setting. This lets you distill knowledge between different model families, like Qwen and Llama.
1
0
0
@edwardbeeching
Edward Beeching
16 days
The recent @thinkymachines post was a great reminder of how effective on-policy distillation is for LLMs. But it highlighted a major practical constraint, one that has limited its use: the teacher and student models must share the same tokenizer.
1
0
1
@LoubnaBenAllal1
Loubna Ben Allal
4 months
SmolLM3 full training and evaluation code is now live, along with 100+ intermediate checkpoints: ✓ Pretraining scripts (nanotron) ✓ Post-training code SFT + APO (TRL/alignment-handbook) ✓ Evaluation scripts to reproduce all reported metrics https://t.co/BoXjKH28vg All
Tweet card summary image
github.com
Everything about the SmolLM and SmolVLM family of models - GitHub - huggingface/smollm: Everything about the SmolLM and SmolVLM family of models
9
73
462
@mishig25
Mishig Davaadorj
4 months
It is cool to be capable. It is cool to know shit. That's why the HF team is open-sourcing not just the model, but the training code and datasets too. Learn. Build. Make it your own.
Tweet card summary image
github.com
Everything about the SmolLM and SmolVLM family of models - GitHub - huggingface/smollm: Everything about the SmolLM and SmolVLM family of models
12
79
628
@cmpatino_
Carlos Miguel Patiño
4 months
I'm super grateful for learning from @_lewtun and @edwardbeeching and for working @huggingface where we can share the details of our work with the community. The blogpost is ready and we'll link there all our artifacts (code, datasets, and recipes)! https://t.co/WGyVIO4rm0 (2/2)
Tweet card summary image
huggingface.co
1
1
20
@edwardbeeching
Edward Beeching
4 months
You can find out more in our blogpost: https://t.co/ZbDiNbeq3x
Tweet card summary image
huggingface.co
0
0
3
@edwardbeeching
Edward Beeching
4 months
I had the opportunity to spend the last month building an open-source, state of the art, dual-mode reasoning model at the 3B scale. Building on the amazing work of @huggingface's Pretraining team. It was tough but we managed to get on the Pareto front with the Qwen3 models. 🧵
2
6
46
@ariG23498
Aritra
6 months
I just noticed that we have a survey of Vision Language Models every year. 2023: https://t.co/OK5w4PVw6I (@RisingSayak and @alaradirik) 2024: https://t.co/TsaD5gDEFc (@mervenoyann and @edwardbeeching) 2025: https://t.co/Wkb4ntbMhn (launched today) This is a nice time to read
Tweet card summary image
huggingface.co
2
12
69
@shirinyamani
Shirin Yamani
6 months
Wanna learn how TRL makes online trainers go fucking brrr with vLLM? check the doc out then!🔥 https://t.co/YmW5kvoyyT
huggingface.co
0
2
26
@casper_hansen_
Casper Hansen
8 months
TRL now handles multi-node training with vLLM for GRPO🤯
3
23
238
@_lewtun
Lewis Tunstall
8 months
I've just pushed the decontaminated subset of CodeForces-CoTs we used to train the OlympicCoder models. Checked for 8-gram overlap against: - AIME24 & AIME25 - GPQA Diamond - MATH-500 - LiveCodeBench - IOI24 Now you can train models to help pass your next LeetCode interview at
4
7
70
@aviral_kumar2
Aviral Kumar
8 months
A lot of work focuses on test-time scaling. But we aren't scaling it optimally, simply training a long CoT doesn't mean we use it well. My students developed "v0" of a paradigm to do this optimally by running RL with dense rewards = minimizing regret over long CoT episodes. 🧵⬇️
3
34
201
@_lewtun
Lewis Tunstall
8 months
We took a deep dive into the Gemma 3 tech report today at Hugging Face and recorded the discussion :) https://t.co/I0PYzmb8Vu It's very cool to see Google baking so many post-training methods into a single model: from online knowledge distillation to RL with model merging RMs
1
5
63
@edwardbeeching
Edward Beeching
8 months
Our OlympicCoder-32B model achieves top-tier performance, surpassing all open-weight models we tested—even some 100x larger! Learn more about how we built the dataset, benchmark, and models: https://t.co/KWsBM2xaHt
Tweet card summary image
huggingface.co
0
1
8