Quentin Gallouédec @QGallouedec X Profile

Quentin Gallouédec

@QGallouedec

Followers

4K

Following

2K

Media

154

Statuses

827

PhD - Research @huggingface 🤗 TRL lead maintainer 🇫🇷 in 🇨🇦

Joined May 2019

Don't wanna be here? Send us removal request.

Quentin Gallouédec

@QGallouedec

9 months

Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration. 🫵 Let's go!

github.com

Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.

35

413

3K

Quentin Gallouédec

@QGallouedec

5 hours

most of these don’t require AI, even for beginners

github.com

Cette issue a pour but de referencer les taches nécessaires qui nous séparent de la v1. Cette list est evolutive et est modifiée en fonction des discussion et des avancées récentes Documentation Re...

merve

@mervenoyann

23 hours

if you want to contribute to open-source but don't know where to begin and just want to use AI, please don't avoid writing AI comments to GH repository issues, not only you are taking maintainers' time but it's also misleading for other devs same goes for PRs, most

0

1

3

Sergio Paniego

@SergioPaniego

23 hours

Sharing the slides from yesterday's talk about "Fine Tuning with TRL" from the @togethercompute x @huggingface workshop we hosted in our Paris office 🎃!

4

10

78

Quentin Gallouédec

@QGallouedec

7 hours

Questions! 🧐 LayerNorm always upcasts inputs to fp32 for stability (hardcoded). But the final multiplication by the weights is in the original dtype. 1. Why? Sometimes we do this multiplication in fp32. 2. When and why?

0

7

Quentin Gallouédec

@QGallouedec

1 day

the bloopers is the best part to understand what's the real life of a post-trainers 👨‍🌾

Lewis Tunstall

@_lewtun

2 days

We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,

1

14

m_ric

@AymericRoucher

3 days

My book is on a great start, #1 best seller on Amazon! 🥳 (thanks to my whole family for boosting sales)

3

2

12

Quentin Gallouédec

@QGallouedec

2 days

more seriously, this is the plan

github.com

Cette issue a pour but de referencer les taches nécessaires qui nous séparent de la v1. Cette list est evolutive et est modifiée en fonction des discussion et des avancées récentes Documentation Re...

0

16

Quentin Gallouédec

@QGallouedec

2 days

https://t.co/ILL3NfZmjZ

5

6

163

Quentin Gallouédec

@QGallouedec

3 days

You shouldn't do RL on small model. Distilling from large models works better. And you can now do it even when tokenizers don't match.

Carlos Miguel Patiño

@cmpatino_

3 days

On-policy distillation is a promising way to train small models, but it’s usually limited to teacher–student pairs sharing the same tokenizer. With our GOLD method, you can now distill across different model families and even outperform GRPO! https://t.co/PAOFCdM4Uk

3

14

261

Shekswess

@Shekswess

3 days

@finbarrtimbers Didn't huggingface have something similar to paperswithcode ?

1

célina

@hanouticelina

5 days

🔥 We're thrilled to announce 𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎_𝚑𝚞𝚋 v1.0! After five years of development, this foundational release is packed with A fully modernized HTTP backend and a complete, from-the-ground-up CLI revamp! $ pip install huggingface_hub --upgrade 🧵highly recommend

8

37

304

Quentin Gallouédec

@QGallouedec

3 days

https://t.co/1DAyoXTRRR

huggingface.co

0

Quentin Gallouédec

@QGallouedec

3 days

GRPOConfig(vllm_enable_sleep_mode=True)

vLLM

@vllm_project

4 days

vLLM Sleep Mode 😴→ ⚡Zero-reload model switching for multi-model serving. Benchmarks: 18–200× faster switches and 61–88% faster first inference vs cold starts. Explanation Blog by @EmbeddedLLM 👇 Why it’s fast: we keep the process alive, preserving the allocator, CUDA graphs,

3

0

48

Quentin Gallouédec

@QGallouedec

8 days

For those who wonder, this has been a core design principle in trl for a while. It wouldn’t work with TIS anyway. Only exception: when you use a reward model, you need to detonenize/retokenize because it’s not guaranteed that the reward model’s tokenizer is the same as the

Rishabh Agarwal

@agarwl_

9 days

This is the one thing that always screws up RL training for LLMs especially at bigger scales -- don't do re-tokenization and just directly use the tokens fed to and generated by your LLM.

0

22

Sergio Paniego

@SergioPaniego

9 days

Meet OpenEnv 👋, an open ecosystem of environments for intelligent agents. Build, share, and test agents safely and consistently. Ideal for training with TRL (we include examples 🤓), deployment, and community collaboration via the HF Hub

1

6

27

Lewis Tunstall

@_lewtun

9 days

Excited to share OpenEnv: frontier-grade RL environments for the open-source community 🔥! https://t.co/KVeBMsxohL 🧩 Modular interfaces: a clean Gymnasium-style API (reset(), step(), state()) that plugs into any RL framework 🐳 Built for scale: run environments in containers

16

42

275

Quentin Gallouédec

@QGallouedec

9 days

big announcement tomorrow. a clue

1

33

Quentin Gallouédec

@QGallouedec

9 days

pretty weak. what if `path` isn't a str?? this may completely break the code 🤦‍♂️

Andrej Karpathy

@karpathy

10 days

@LucasAtkins7 This code is extremely dangerous. Here, I improved it.

19

0

226

Quentin Gallouédec

@QGallouedec

13 days

I'll be in SF all week, DM if you want to chat 🧨

1

0

17

Sergio Paniego

@SergioPaniego

18 days

Fine-tune Qwen3-VL with TRL on a free Colab GPU notebook! SFT: https://t.co/aQyc38etcW GRPO:

colab.research.google.com

Run, share, and edit Python notebooks

Qwen

@Alibaba_Qwen

18 days

Introducing the compact, dense versions of Qwen3-VL — now available in 4B and 8B pairs, each with both Instruct and Thinking variants. ✅ Lower VRAM usage ✅ Full Qwen3-VL capabilities retained ✅ Strong performance across the board Despite their size, they outperform models

0

7

39

Quentin Gallouédec

@QGallouedec

17 days

sft loss is wrong when using grad accumulation 🥶

Andrej Karpathy

@karpathy

19 days

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,

11

10

278