maximelabonne Profile Banner
Maxime Labonne Profile
Maxime Labonne

@maximelabonne

Followers
25K
Following
9K
Media
762
Statuses
3K

Head of Post-Training @liquidai πŸ’» GitHub: https://t.co/ElXDsjz8YP πŸ€— HF: https://t.co/2ECS7GiJGD πŸ“ Blog: https://t.co/Gz5bhbXWT0

London, England
Joined October 2017
Don't wanna be here? Send us removal request.
@maximelabonne
Maxime Labonne
5 hours
You always think you're safe until your job becomes a benchmark.
@maksym_andr
Maksym Andriushchenko
20 hours
We release PostTrainBench: a benchmark measuring how well AI agents like Claude Code can post-train base LLMs. We expect this to be an important indicator for AI R&D automation as it unfolds over the next few years. πŸ”— https://t.co/dVSSHkpAE1 πŸ“‚ https://t.co/vqZNrQw66z 1/n
3
1
56
@maximelabonne
Maxime Labonne
7 days
Beyond removing refusals, I hope these techniques can be used for "latent fine-tuning" to customize models at inference time. https://t.co/rqacwXnmzf
Tweet card summary image
github.com
Fully automatic censorship removal for language models - p-e-w/heretic
0
2
8
@maximelabonne
Maxime Labonne
7 days
Abliterate LLMs with Heretic 1.1 It's cool to see this project evolving into a solid open-source library The new viz feature shows how the abliteration process gradually groups the residual vectors into two nice clusters πŸ‘€
2
3
39
@maximelabonne
Maxime Labonne
10 days
@asapzzhou More details in @asapzzhou's thread, please give him a like :)
@asapzzhou
Zhanhui Zhou @ NeurIPS
10 days
(1/n) Tiny-A2D: An Open Recipe to Turn Any AR LM into a Diffusion LM Code (dLLM): https://t.co/yYNBo4N99B Checkpoints: https://t.co/fBG4MmoaTZ With dLLM, you can turn ANY autoregressive LM into a diffusion LM (parallel generation + infilling) with minimal compute. Using this
0
2
13
@maximelabonne
Maxime Labonne
10 days
@asapzzhou Here are checkpoints made with this recipe you can actually try in dLLM
Tweet card summary image
huggingface.co
1
1
26
@maximelabonne
Maxime Labonne
10 days
Open recipe to turn Qwen3 into a diffusion LLM πŸ‘€πŸ‘€ > Swap the causal mask for bidirectional attention > Source model matters a lot for performance > Block diffusion (BD3LM) >> masked diffusion (MDLM) > Light SFT with masking Great work from @asapzzhou with his dLLM library!
17
122
875
@maximelabonne
Maxime Labonne
13 days
The "commodity AI" thesis is wrong. The API market is splitting into two modalities: - Premium models (Claude) dominate programming and high-stakes work. Users pay $2+/M tokens because correct code > cheap code. - Cheap open models own roleplay and creative tasks. Volume is
11
8
73
@maximelabonne
Maxime Labonne
14 days
Please release the forbidden training dataset.
@AdamEisgrau
Adam Eisgrau
15 days
BREAKING: @OpenAI must tuner over 20 million+ chat logs to plaintiffs, Judge Ona Wang has ruled in a 9-pg Order just issued:
3
5
49
@maximelabonne
Maxime Labonne
14 days
Proudly powered by LFM2 for the language backbone Compared to Qwen2.5-1.5B, it achieves 2.9x higher throughput and 4x larger context length on NPU hardware
@nexa_ai
NEXA AI
15 days
Today we're releasing AutoNeural-VL-1.5B β€” the world's first real-time multimodal model built for in-car AI. It runs fully local on the @Qualcomm SA8295P NPU with a software–hardware co-designed architecture, setting a new bar for speed and quality. AutoNeural redefines what AI
1
3
51
@liquidai
Liquid AI
14 days
Today we introduce Liquid Labs, our advanced research unit, with the goal of understanding and building efficient and adaptive intelligence systems. Liquid Labs consolidates our existing research efforts at Liquid across architecture of foundation models, multimodality,
18
34
240
@maximelabonne
Maxime Labonne
15 days
You don't understand evals Everybody in AI should read this πŸ‘‡
@clefourrier
Clémentine Fourrier 🍊 is off till Dec 2026 hiking
15 days
Hey twitter! I'm releasing the LLM Evaluation Guidebook v2! Updated, nicer to read, interactive graphics, etc! https://t.co/xG4VQOj2wN After this, I'm off: I'm taking a sabbatical to go hike with my dogs :D (back @huggingface in Dec *2026*) See you all next year!
4
47
586
@maximelabonne
Maxime Labonne
17 days
LFM2 Technical Report dropped! πŸ₯³ It provides details about the LFM2 architecture, pre-training, post-training, vision, audio, and ColBERT models It's 51 pages long, have fun!
7
37
167
@saagnikkk
Sagnik
18 days
🚨New Blog Alert: Is AdamW an overkill for RLVR? We found that vanilla SGD is 1. As performant as AdamW, 2. 36x more parameter efficient naturally. (much more than a rank 1 lora) 🀯 Looks like a "free lunch". Maybe It’s time to rethink the optimizers for RLVR 🧡
16
57
476
@maximelabonne
Maxime Labonne
18 days
Here's the calculation without FP32 gradient accumulation: - Model parameters (FP32): 4 bytes per param - Gradients (FP32): 4 bytes per param - Adam's optimizer states (momentum + variance): 8 bytes per param - SGD: 0! (Still have to add activations on top.)
0
0
2
@maximelabonne
Maxime Labonne
18 days
SGD without momentum is beautiful
1
0
3
@maximelabonne
Maxime Labonne
18 days
Does SGD > AdamW for RLVR? > RLVR updates very few parameters (sparse subnetwork) > The "active" parameters may share similar properties > Similar loss curvature β†’ single learning rate sufficient > SGD (uniform LR) β‰ˆ AdamW (adaptive LR) It means RLFT on potato GPUs!
2
4
31
@maximelabonne
Maxime Labonne
18 days
ToolOrchestra is such a cool work from @nvidia Just an 8B model trained on calling tools and other LLMs to answer queries It's a great demo of what frontier SLMs will be about in 2026
15
79
488