Edward Beeching
@edwardbeeching
Followers
2K
Following
611
Media
57
Statuses
228
Research Scientist @HuggingFace. PhD in Deep RL approaches for Robotic Navigation @INRIA.
Lyon, France
Joined July 2010
This opens up more flexible and powerful student-teacher pairings. You can now pick the best teacher and best student for the job, without worrying if their tokenizers match. You can read the full technical write-up and find the open-source code here:
huggingface.co
0
0
0
We also benchmarked GOLD against GRPO. Even in this difficult cross-tokenizer scenario, GOLD still outperformed GRPO by 20%
1
0
0
The results on math reasoning tasks are very promising. In a cross-tokenizer setup, GOLD recovered 60% of the teacher's performance. The old ULD baseline only managed 10%.
1
0
0
How does it work? GOLD makes two key improvements over previous methods like ULD. First, a better sequence alignment that merges tokens by summing log probabilities. Second, a hybrid vocabulary alignment that finds 1-to-1 token mappings where possible.
1
0
0
We were already working on a way to solve this. We're introducing General On-Policy Logit Distillation (GOLD), a new method that extends on-policy distillation to the cross-tokenizer setting. This lets you distill knowledge between different model families, like Qwen and Llama.
1
0
0
The recent @thinkymachines post was a great reminder of how effective on-policy distillation is for LLMs. But it highlighted a major practical constraint, one that has limited its use: the teacher and student models must share the same tokenizer.
1
0
1
SmolLM3 full training and evaluation code is now live, along with 100+ intermediate checkpoints: ✓ Pretraining scripts (nanotron) ✓ Post-training code SFT + APO (TRL/alignment-handbook) ✓ Evaluation scripts to reproduce all reported metrics https://t.co/BoXjKH28vg All
github.com
Everything about the SmolLM and SmolVLM family of models - GitHub - huggingface/smollm: Everything about the SmolLM and SmolVLM family of models
9
73
462
It is cool to be capable. It is cool to know shit. That's why the HF team is open-sourcing not just the model, but the training code and datasets too. Learn. Build. Make it your own.
github.com
Everything about the SmolLM and SmolVLM family of models - GitHub - huggingface/smollm: Everything about the SmolLM and SmolVLM family of models
12
79
628
I'm super grateful for learning from @_lewtun and @edwardbeeching and for working @huggingface where we can share the details of our work with the community. The blogpost is ready and we'll link there all our artifacts (code, datasets, and recipes)! https://t.co/WGyVIO4rm0 (2/2)
huggingface.co
1
1
20
I had the opportunity to spend the last month building an open-source, state of the art, dual-mode reasoning model at the 3B scale. Building on the amazing work of @huggingface's Pretraining team. It was tough but we managed to get on the Pareto front with the Qwen3 models. 🧵
2
6
46
I just noticed that we have a survey of Vision Language Models every year. 2023: https://t.co/OK5w4PVw6I (@RisingSayak and @alaradirik) 2024: https://t.co/TsaD5gDEFc (@mervenoyann and @edwardbeeching) 2025: https://t.co/Wkb4ntbMhn (launched today) This is a nice time to read
huggingface.co
2
12
69
Wanna learn how TRL makes online trainers go fucking brrr with vLLM? check the doc out then!🔥 https://t.co/YmW5kvoyyT
huggingface.co
0
2
26
TRL now handles multi-node training with vLLM for GRPO🤯
3
23
238
I've just pushed the decontaminated subset of CodeForces-CoTs we used to train the OlympicCoder models. Checked for 8-gram overlap against: - AIME24 & AIME25 - GPQA Diamond - MATH-500 - LiveCodeBench - IOI24 Now you can train models to help pass your next LeetCode interview at
4
7
70
A lot of work focuses on test-time scaling. But we aren't scaling it optimally, simply training a long CoT doesn't mean we use it well. My students developed "v0" of a paradigm to do this optimally by running RL with dense rewards = minimizing regret over long CoT episodes. 🧵⬇️
3
34
201
We took a deep dive into the Gemma 3 tech report today at Hugging Face and recorded the discussion :) https://t.co/I0PYzmb8Vu It's very cool to see Google baking so many post-training methods into a single model: from online knowledge distillation to RL with model merging RMs
1
5
63
This work was led by my students at CMU: @QuYuxiao @matthewyryang @setlur_amrith w/ @_lewtun, @edwardbeeching, and @rsalakhu. Feedback is very welcome! Paper: https://t.co/5nZu33VFOd Big thanks to @LunjunZhang who's ideas inspired a lot in this line of thought!
arxiv.org
Training models to effectively use test-time compute is crucial for improving the reasoning performance of LLMs. Current methods mostly do so via fine-tuning on search traces or running RL with...
1
1
9
Our OlympicCoder-32B model achieves top-tier performance, surpassing all open-weight models we tested—even some 100x larger! Learn more about how we built the dataset, benchmark, and models: https://t.co/KWsBM2xaHt
huggingface.co
0
1
8