Quentin Gallouédec
@QGallouedec
Followers
4K
Following
2K
Media
154
Statuses
827
PhD - Research @huggingface 🤗 TRL lead maintainer 🇫🇷 in 🇨🇦
Joined May 2019
Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration. 🫵 Let's go!
github.com
Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.
35
413
3K
most of these don’t require AI, even for beginners
github.com
Cette issue a pour but de referencer les taches nécessaires qui nous séparent de la v1. Cette list est evolutive et est modifiée en fonction des discussion et des avancées récentes Documentation Re...
if you want to contribute to open-source but don't know where to begin and just want to use AI, please don't avoid writing AI comments to GH repository issues, not only you are taking maintainers' time but it's also misleading for other devs same goes for PRs, most
0
1
3
Sharing the slides from yesterday's talk about "Fine Tuning with TRL" from the @togethercompute x @huggingface workshop we hosted in our Paris office 🎃!
4
10
78
Questions! 🧐 LayerNorm always upcasts inputs to fp32 for stability (hardcoded). But the final multiplication by the weights is in the original dtype. 1. Why? Sometimes we do this multiplication in fp32. 2. When and why?
0
0
7
the bloopers is the best part to understand what's the real life of a post-trainers 👨🌾
We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,
1
1
14
My book is on a great start, #1 best seller on Amazon! 🥳 (thanks to my whole family for boosting sales)
3
2
12
You shouldn't do RL on small model. Distilling from large models works better. And you can now do it even when tokenizers don't match.
On-policy distillation is a promising way to train small models, but it’s usually limited to teacher–student pairs sharing the same tokenizer. With our GOLD method, you can now distill across different model families and even outperform GRPO! https://t.co/PAOFCdM4Uk
3
14
261
@finbarrtimbers Didn't huggingface have something similar to paperswithcode ?
1
1
1
🔥 We're thrilled to announce 𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎_𝚑𝚞𝚋 v1.0! After five years of development, this foundational release is packed with A fully modernized HTTP backend and a complete, from-the-ground-up CLI revamp! $ pip install huggingface_hub --upgrade 🧵highly recommend
8
37
304
GRPOConfig(vllm_enable_sleep_mode=True)
vLLM Sleep Mode 😴→ ⚡Zero-reload model switching for multi-model serving. Benchmarks: 18–200× faster switches and 61–88% faster first inference vs cold starts. Explanation Blog by @EmbeddedLLM 👇 Why it’s fast: we keep the process alive, preserving the allocator, CUDA graphs,
3
0
48
For those who wonder, this has been a core design principle in trl for a while. It wouldn’t work with TIS anyway. Only exception: when you use a reward model, you need to detonenize/retokenize because it’s not guaranteed that the reward model’s tokenizer is the same as the
This is the one thing that always screws up RL training for LLMs especially at bigger scales -- don't do re-tokenization and just directly use the tokens fed to and generated by your LLM.
0
0
22
Meet OpenEnv 👋, an open ecosystem of environments for intelligent agents. Build, share, and test agents safely and consistently. Ideal for training with TRL (we include examples 🤓), deployment, and community collaboration via the HF Hub
1
6
27
Excited to share OpenEnv: frontier-grade RL environments for the open-source community 🔥! https://t.co/KVeBMsxohL 🧩 Modular interfaces: a clean Gymnasium-style API (reset(), step(), state()) that plugs into any RL framework 🐳 Built for scale: run environments in containers
16
42
275
pretty weak. what if `path` isn't a str?? this may completely break the code 🤦♂️
19
0
226
Fine-tune Qwen3-VL with TRL on a free Colab GPU notebook! SFT: https://t.co/aQyc38etcW GRPO:
colab.research.google.com
Run, share, and edit Python notebooks
Introducing the compact, dense versions of Qwen3-VL — now available in 4B and 8B pairs, each with both Instruct and Thinking variants. ✅ Lower VRAM usage ✅ Full Qwen3-VL capabilities retained ✅ Strong performance across the board Despite their size, they outperform models
0
7
39
sft loss is wrong when using grad accumulation 🥶
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
11
10
278