QGallouedec Profile Banner
Quentin Gallouédec Profile
Quentin Gallouédec

@QGallouedec

Followers
4K
Following
2K
Media
154
Statuses
827

PhD - Research @huggingface 🤗 TRL lead maintainer 🇫🇷 in 🇨🇦

Joined May 2019
Don't wanna be here? Send us removal request.
@QGallouedec
Quentin Gallouédec
9 months
Last moments of closed-source AI 🪦 : Hugging Face is openly reproducing the pipeline of 🐳 DeepSeek-R1. Open data, open training. open models, open collaboration. 🫵 Let's go!
Tweet card summary image
github.com
Fully open reproduction of DeepSeek-R1. Contribute to huggingface/open-r1 development by creating an account on GitHub.
35
413
3K
@QGallouedec
Quentin Gallouédec
5 hours
most of these don’t require AI, even for beginners
Tweet card summary image
github.com
Cette issue a pour but de referencer les taches nécessaires qui nous séparent de la v1. Cette list est evolutive et est modifiée en fonction des discussion et des avancées récentes Documentation Re...
@mervenoyann
merve
23 hours
if you want to contribute to open-source but don't know where to begin and just want to use AI, please don't avoid writing AI comments to GH repository issues, not only you are taking maintainers' time but it's also misleading for other devs same goes for PRs, most
0
1
3
@SergioPaniego
Sergio Paniego
23 hours
Sharing the slides from yesterday's talk about "Fine Tuning with TRL" from the @togethercompute x @huggingface workshop we hosted in our Paris office 🎃!
4
10
78
@QGallouedec
Quentin Gallouédec
7 hours
Questions! 🧐 LayerNorm always upcasts inputs to fp32 for stability (hardcoded). But the final multiplication by the weights is in the original dtype. 1. Why? Sometimes we do this multiplication in fp32. 2. When and why?
0
0
7
@QGallouedec
Quentin Gallouédec
1 day
the bloopers is the best part to understand what's the real life of a post-trainers 👨‍🌾
@_lewtun
Lewis Tunstall
2 days
We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,
1
1
14
@AymericRoucher
m_ric
3 days
My book is on a great start, #1 best seller on Amazon! 🥳 (thanks to my whole family for boosting sales)
3
2
12
@QGallouedec
Quentin Gallouédec
2 days
5
6
163
@QGallouedec
Quentin Gallouédec
3 days
You shouldn't do RL on small model. Distilling from large models works better. And you can now do it even when tokenizers don't match.
@cmpatino_
Carlos Miguel Patiño
3 days
On-policy distillation is a promising way to train small models, but it’s usually limited to teacher–student pairs sharing the same tokenizer. With our GOLD method, you can now distill across different model families and even outperform GRPO! https://t.co/PAOFCdM4Uk
3
14
261
@Shekswess
Shekswess
3 days
@finbarrtimbers Didn't huggingface have something similar to paperswithcode ?
1
1
1
@hanouticelina
célina
5 days
🔥 We're thrilled to announce 𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎_𝚑𝚞𝚋 v1.0! After five years of development, this foundational release is packed with A fully modernized HTTP backend and a complete, from-the-ground-up CLI revamp! $ pip install huggingface_hub --upgrade 🧵highly recommend
8
37
304
@QGallouedec
Quentin Gallouédec
3 days
GRPOConfig(vllm_enable_sleep_mode=True)
@vllm_project
vLLM
4 days
vLLM Sleep Mode 😴→ ⚡Zero-reload model switching for multi-model serving. Benchmarks: 18–200× faster switches and 61–88% faster first inference vs cold starts. Explanation Blog by @EmbeddedLLM 👇 Why it’s fast: we keep the process alive, preserving the allocator, CUDA graphs,
3
0
48
@QGallouedec
Quentin Gallouédec
8 days
For those who wonder, this has been a core design principle in trl for a while. It wouldn’t work with TIS anyway. Only exception: when you use a reward model, you need to detonenize/retokenize because it’s not guaranteed that the reward model’s tokenizer is the same as the
@agarwl_
Rishabh Agarwal
9 days
This is the one thing that always screws up RL training for LLMs especially at bigger scales -- don't do re-tokenization and just directly use the tokens fed to and generated by your LLM.
0
0
22
@SergioPaniego
Sergio Paniego
9 days
Meet OpenEnv 👋, an open ecosystem of environments for intelligent agents. Build, share, and test agents safely and consistently. Ideal for training with TRL (we include examples 🤓), deployment, and community collaboration via the HF Hub
1
6
27
@_lewtun
Lewis Tunstall
9 days
Excited to share OpenEnv: frontier-grade RL environments for the open-source community 🔥! https://t.co/KVeBMsxohL 🧩 Modular interfaces: a clean Gymnasium-style API (reset(), step(), state()) that plugs into any RL framework 🐳 Built for scale: run environments in containers
16
42
275
@QGallouedec
Quentin Gallouédec
9 days
big announcement tomorrow. a clue
1
1
33
@QGallouedec
Quentin Gallouédec
9 days
pretty weak. what if `path` isn't a str?? this may completely break the code 🤦‍♂️
@karpathy
Andrej Karpathy
10 days
@LucasAtkins7 This code is extremely dangerous. Here, I improved it.
19
0
226
@QGallouedec
Quentin Gallouédec
13 days
I'll be in SF all week, DM if you want to chat 🧨
1
0
17
@SergioPaniego
Sergio Paniego
18 days
Fine-tune Qwen3-VL with TRL on a free Colab GPU notebook! SFT: https://t.co/aQyc38etcW GRPO:
Tweet card summary image
colab.research.google.com
Run, share, and edit Python notebooks
@Alibaba_Qwen
Qwen
18 days
Introducing the compact, dense versions of Qwen3-VL — now available in 4B and 8B pairs, each with both Instruct and Thinking variants. ✅ Lower VRAM usage ✅ Full Qwen3-VL capabilities retained ✅ Strong performance across the board Despite their size, they outperform models
0
7
39
@QGallouedec
Quentin Gallouédec
17 days
sft loss is wrong when using grad accumulation 🥶
@karpathy
Andrej Karpathy
19 days
Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single,
11
10
278