Edward Beeching @edwardbeeching X Profile

Edward Beeching

@edwardbeeching

Followers

2K

Following

611

Media

57

Statuses

228

Research Scientist @HuggingFace. PhD in Deep RL approaches for Robotic Navigation @INRIA.

Lyon, France

Joined July 2010

Don't wanna be here? Send us removal request.

Edward Beeching

@edwardbeeching

16 days

This opens up more flexible and powerful student-teacher pairings. You can now pick the best teacher and best student for the job, without worrying if their tokenizers match. You can read the full technical write-up and find the open-source code here:

huggingface.co

0

Edward Beeching

@edwardbeeching

16 days

We also benchmarked GOLD against GRPO. Even in this difficult cross-tokenizer scenario, GOLD still outperformed GRPO by 20%

1

0

Edward Beeching

@edwardbeeching

16 days

The results on math reasoning tasks are very promising. In a cross-tokenizer setup, GOLD recovered 60% of the teacher's performance. The old ULD baseline only managed 10%.

1

0

Edward Beeching

@edwardbeeching

16 days

How does it work? GOLD makes two key improvements over previous methods like ULD. First, a better sequence alignment that merges tokens by summing log probabilities. Second, a hybrid vocabulary alignment that finds 1-to-1 token mappings where possible.

1

0

Edward Beeching

@edwardbeeching

16 days

We were already working on a way to solve this. We're introducing General On-Policy Logit Distillation (GOLD), a new method that extends on-policy distillation to the cross-tokenizer setting. This lets you distill knowledge between different model families, like Qwen and Llama.

1

0

Edward Beeching

@edwardbeeching

16 days

The recent @thinkymachines post was a great reminder of how effective on-policy distillation is for LLMs. But it highlighted a major practical constraint, one that has limited its use: the teacher and student models must share the same tokenizer.

1

0

1

Loubna Ben Allal

@LoubnaBenAllal1

4 months

SmolLM3 full training and evaluation code is now live, along with 100+ intermediate checkpoints: ✓ Pretraining scripts (nanotron) ✓ Post-training code SFT + APO (TRL/alignment-handbook) ✓ Evaluation scripts to reproduce all reported metrics https://t.co/BoXjKH28vg All

github.com

Everything about the SmolLM and SmolVLM family of models - GitHub - huggingface/smollm: Everything about the SmolLM and SmolVLM family of models

9

73

462

Mishig Davaadorj

@mishig25

4 months

It is cool to be capable. It is cool to know shit. That's why the HF team is open-sourcing not just the model, but the training code and datasets too. Learn. Build. Make it your own.

github.com

Everything about the SmolLM and SmolVLM family of models - GitHub - huggingface/smollm: Everything about the SmolLM and SmolVLM family of models

12

79

628

Carlos Miguel Patiño

@cmpatino_

4 months

I'm super grateful for learning from @_lewtun and @edwardbeeching and for working @huggingface where we can share the details of our work with the community. The blogpost is ready and we'll link there all our artifacts (code, datasets, and recipes)! https://t.co/WGyVIO4rm0 (2/2)

huggingface.co

1

20

Edward Beeching

@edwardbeeching

4 months

You can find out more in our blogpost: https://t.co/ZbDiNbeq3x

huggingface.co

0

3

Edward Beeching

@edwardbeeching

4 months

I had the opportunity to spend the last month building an open-source, state of the art, dual-mode reasoning model at the 3B scale. Building on the amazing work of @huggingface's Pretraining team. It was tough but we managed to get on the Pareto front with the Qwen3 models. 🧵

2

6

46

Aritra

@ariG23498

6 months

I just noticed that we have a survey of Vision Language Models every year. 2023: https://t.co/OK5w4PVw6I (@RisingSayak and @alaradirik) 2024: https://t.co/TsaD5gDEFc (@mervenoyann and @edwardbeeching) 2025: https://t.co/Wkb4ntbMhn (launched today) This is a nice time to read

huggingface.co

2

12

69

Shirin Yamani

@shirinyamani

6 months

Wanna learn how TRL makes online trainers go fucking brrr with vLLM? check the doc out then!🔥 https://t.co/YmW5kvoyyT

huggingface.co

0

2

26

Casper Hansen

@casper_hansen_

8 months

TRL now handles multi-node training with vLLM for GRPO🤯

3

23

238

Lewis Tunstall

@_lewtun

8 months

I've just pushed the decontaminated subset of CodeForces-CoTs we used to train the OlympicCoder models. Checked for 8-gram overlap against: - AIME24 & AIME25 - GPQA Diamond - MATH-500 - LiveCodeBench - IOI24 Now you can train models to help pass your next LeetCode interview at

4

7

70

Lewis Tunstall

@_lewtun

8 months

GRPO is about to go brrrr in TRL https://t.co/gkNp3dvok5

github.com

WarningThe following description is outdated, please refer to the TRL doc What does this PR do? This PR isolates VLLM from main GRPO training process(es), using only http & NCCL to communi...

2

18

181

Aviral Kumar

@aviral_kumar2

8 months

A lot of work focuses on test-time scaling. But we aren't scaling it optimally, simply training a long CoT doesn't mean we use it well. My students developed "v0" of a paradigm to do this optimally by running RL with dense rewards = minimizing regret over long CoT episodes. 🧵⬇️

3

34

201

Lewis Tunstall

@_lewtun

8 months

We took a deep dive into the Gemma 3 tech report today at Hugging Face and recorded the discussion :) https://t.co/I0PYzmb8Vu It's very cool to see Google baking so many post-training methods into a single model: from online knowledge distillation to RL with model merging RMs

1

5

63

Aviral Kumar

@aviral_kumar2

8 months

This work was led by my students at CMU: @QuYuxiao @matthewyryang @setlur_amrith w/ @_lewtun, @edwardbeeching, and @rsalakhu. Feedback is very welcome! Paper: https://t.co/5nZu33VFOd Big thanks to @LunjunZhang who's ideas inspired a lot in this line of thought!

arxiv.org

Training models to effectively use test-time compute is crucial for improving the reasoning performance of LLMs. Current methods mostly do so via fine-tuning on search traces or running RL with...

1

9

Edward Beeching

@edwardbeeching

8 months

Our OlympicCoder-32B model achieves top-tier performance, surpassing all open-weight models we tested—even some 100x larger! Learn more about how we built the dataset, benchmark, and models: https://t.co/KWsBM2xaHt

huggingface.co

0

1

8