Clémentine Fourrier 🍊 is off till Dec 2026 hiking
@clefourrier
Followers
6K
Following
11K
Media
208
Statuses
4K
Evals/dogs @HuggingFace ✨ "The future is already here, it’s just not very evenly distributed" (Gibson)
Joined October 2019
Hey twitter! I'm releasing the LLM Evaluation Guidebook v2! Updated, nicer to read, interactive graphics, etc! https://t.co/xG4VQOj2wN After this, I'm off: I'm taking a sabbatical to go hike with my dogs :D (back @huggingface in Dec *2026*) See you all next year!
22
165
992
👀Introducing a brand new @yupp_ai SVG leaderboard ranking frontier models on the generation of coherent and visually appealing SVGs! Gemini 3 Pro by @GoogleDeepMind takes the crown as the most powerful model! 👏 We’re also releasing a public SVG dataset. Details in🧵
32
68
459
Either you crack general intelligence -- the ability to efficiently acquire arbitrary skills on your own -- or you don't have AGI. A big pile of task-specific skills memorized from handcrafted/generated environments isn't AGI, not matter how big.
106
116
1K
Hey twitter! I'm releasing the LLM Evaluation Guidebook v2! Updated, nicer to read, interactive graphics, etc! https://t.co/xG4VQOj2wN After this, I'm off: I'm taking a sabbatical to go hike with my dogs :D (back @huggingface in Dec *2026*) See you all next year!
1
2
22
@huggingface cc @maximelabonne since you wanted an update :P
1
0
11
If you see improvements, I'd love to hear them (within the next 2 days) :) Many thanks to @thibaudfrere for his help on the banner and @gui_penedo for his proofreading! If you've got eval needs, your new PoC is @nathanhabib1011 (with a focus on lighteval)!
1
0
21
The guide is very beginner friendly, as we go from the basics of tokenization/inference to the nits and tricks of running eval properly, so it's compatible with all levels. Should contain most of what we wrote about evals at HF in a single unified place, with updates ofc :)
1
1
29
as a researcher, it makes no sense to compare reasoning vs non reasoning models on benches like the ones in Artificial Analysis without normalizing somehow by cost or output tokens. non reasoning models (base/instruct) are important for the open ecosystem since research teams and
8
8
93
Mistral has delivered super capable small models but no one is talking about it so here I go
32
44
586
Transformers v5's first release candidate is out 🔥 The biggest release of my life. It's been five years since the last major (v4). From 20 architectures to 400, 20k daily downloads to 3 million. The release is huge, w/ tokenization (no slow tokenizers!), modeling & processing.
20
89
573
stop looking at HLE (with tools), most of these mean "has web access" the answers to HLE are easily accessible in ungated mirrors (and prob a dozen other places). the only question is why those agents don't score 100%
8
12
149
Okay, but, wait, what reasoning traces should I train on? Excited to share our latest research paper together with @nvidia: Learning to Reason: Training LLMs with GPT-OSS or DeepSeek R1 Reasoning Traces https://t.co/ddktyLoJt7 🧵
arxiv.org
Test-time scaling, which leverages additional computation during inference to improve model accuracy, has enabled a new class of Large Language Models (LLMs) that are able to reason through...
3
8
31
China just passed the U.S. in open model downloads for the first time 👀 New data from Economies of Open Intelligence led by @huggingface policy team & community collaborators, presents some notable observations: ✨ Developer adoption In 2025, Chinese model developers saw
1
26
89
Introducing "The Eiffel Tower Llama"!🗼 Remember Golden Gate Claude? Unfortunately Anthropic's viral demo was shut down after 24h, and key technical details remained hidden. So we recreated it, uncovering key insights on steering LLMs using SAEs⚒️ Full blog post + live demo 👇
9
41
175
Non-natural image gen and editing are difficult tasks. We tested the state of the art at the time — including Nano Banana 1.0 & GPT-image — all performed quite poorly on StructBench. Nano Banana 2 (NB2) just dropped, and its improvements strongly validate a direction we studied
5
10
78
"The most underappreciated legend of the tech industry?" I see posts like this one every day 😭 And, obviously he is a respected professional, but he is far from underappreciated. Check Sophie Wilson. Most people haven't heard about her, but she is the primary architect of the
> created Linux kernel at 21 > built Git because nothing else was good enough > becomes backbone of servers, Android, cloud, supercomputers > never chased fame, money, titles, hype > stays private, consistent, brutally honest for decades > still reviews patches, still
61
512
7K
"Professors definitely deserve to have their names on the papers." I think this take is completely wrong. Financial support does not warrant co-authorship. Bob Gallager (a legendary information theorist who retired from MIT) did not co-author any papers with many of his
@shaananc @thegautamkamath I agree with most of your statement. However, there’s no “simply” leading a group or advising PhD students. Those activities require tremendous efforts both intellectually and financially. Not to say that in the US, all of PhD students’ funding comes from professors’ grant money.
16
19
360
Know anyone who need some help to get started with ML, open source and 🤗? We partnered with @TechToTheRescue, a tech for good incubator, & answered all AI questions their non profits had to create an FAQ! https://t.co/psgu8N1cSm Come add your Q/As, it's collaborative! 🔥
2
3
12
incredibly detailed technical blog just dropped on the anatomy of BoltzGen 🧬 made for ML people, but covering everything from molecular representations to diffusion-based generation of protein binders + crazy good interactive visuals 👏👏 @ludocomito
6
37
233