FerdinandMom Profile Banner
Ferdinand Mom Profile
Ferdinand Mom

@FerdinandMom

Followers
3K
Following
2K
Media
35
Statuses
553

Distributed & Decentralized training @HuggingFace

France
Joined October 2013
Don't wanna be here? Send us removal request.
@FerdinandMom
Ferdinand Mom
1 year
Interested in 4D parallelism but feeling overwhelmed by Megatron-LM codebase? We are currently cooking something with @Haojun_Zhao14 and @xariusrke 😉 In the meantime, here is a self-contained script that implements Pipeline Parallelism (AFAB + 1F1B) in 200 LOC 🧵👇
11
43
234
@humanoidsdaily
Humanoids daily
4 days
🚨 Stealth Startup Alert A new player has quietly entered the humanoid race in Paris. 🇫🇷 "UMA" is building a founding team for a general-purpose robot with a unique focus: "stylized" movement and end-to-end AI. We dug into the job listings to decode their strategy.
Tweet card summary image
humanoidsdaily.com
Job listings reveal a new French venture building "intelligent humanoid robots" with a focus on stylized movement and learning-based control, adding to the heating European robotics landscape.
7
16
100
@dlouapre
David Louapre
3 days
Introducing "The Eiffel Tower Llama"!🗼 Remember Golden Gate Claude? Unfortunately Anthropic's viral demo was shut down after 24h, and key technical details remained hidden. So we recreated it, uncovering key insights on steering LLMs using SAEs⚒️ Full blog post + live demo 👇
9
43
165
@remi_or_
Rémi Ouazan
4 days
Why are vLLM and transformers so damn fast? ⚡ Continuous batching. That's the secret sauce 🔥 Never heard of it? We just dropped a blog post building it up from first principles 🤗 See what happens inside the minds of the engineers pushing inference to the edge 🧠
4
32
190
@art_zucker
Arthur Zucker
4 days
Mega cool blog post about attention, masking and continuous batching! I think its a keystone for people who want to understand deeply current serving framework and how flash attention unlocks everything!
6
48
365
@LysandreJik
Lysandre
16 days
We've just started a Diffusers "MVP" program to work more closely with open-source collaborators. We'd love to work with you into making Diffusers a better tool than it is right now; please come and help make diffusion more accessible! Many open issues! https://t.co/5JiLemtLR0
1
4
28
@Thom_Wolf
Thomas Wolf
1 month
We’ve cooked another one of these 200+ pages practical books on model training that we love to write. This time it’s on all pretraining and post-training recipes and how to do a training project hyper parameter exploration. Closing the trilogy of: 1. Building a pretraining
@eliebakouch
elie
1 month
Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably https://t.co/iN2JtWhn23
22
116
1K
@Ar_Douillard
Arthur Douillard
1 month
As a scifi-fi nerd, Starcloud is super exciting: https://t.co/Y7dPYz9ls2 but this applications sounds bullshit to me? Latency isn't going to take hours, and wildfires detection can wait 20s
1
3
18
@FerdinandMom
Ferdinand Mom
1 month
Good shit from Monarch. Previously, i'll have to ssh to every node and remote attach my debugguer to all node. Now no longer the case
1
0
7
@Ar_Douillard
Arthur Douillard
1 month
Non-distributed DiLoCo as a super lookahead: Kalluski et al. from @metaai released a study of using Nesterov on outer gradients: https://t.co/cYupiYyQru The algo that they nicknamed SNOO is basically with DiLoCo with M=1, meaning that every K steps, a delta is computed between
5
12
79
@gui_penedo
Guilherme Penedo
1 month
New dataset release: 🌐FineWiki This is an updated and better extracted version of Wikipedia, covering 325+ languages. Unlike the old dataset from 2023, we kept all the math content, tables, properly rendered templates, and extracted key facts. Examples and highlights below.
17
76
552
@HKydlicek
Hynek Kydlíček
1 month
We’re releasing the full FinePdfs source code — plus new datasets and models! 🚀 📚 Datasets: • OCR-Annotations — 1.6k PDFs labeled for OCR need • Gemma-LID-Annotation — 20k samples per language (annotated with Gemma3-27B) 🤖 Models: • XGB-OCR — OCR classifier for PDFs
5
73
446
@willccbb
will brown
2 months
we're making our own chips btw. targeting Q1 2026
36
11
697
@m_olbap
Pablo Montalvo
2 months
Super excited to finally post this interactive resource! We maintain 1M+ Python LOC across 400+ model architectures in 🤗 Transformers. How do we keep it controlled and keep shipping models? With @LysandreJik, @pcuenq and @yonigoz we wrote down what makes it possible. Dive here!
5
23
85
@eustachelb
Eustache Le Bihan
2 months
Cool release by @LiquidAI_: LFM2-Audio-1.5B It’s a pretty cool omni-architecture that enables prediction of both text and audio tokens, meaning it can handle multi-turn S2S, ASR, and TTS (with voice description) within a single model. Great to see, once again this year, a model
2
35
164
@thdxr
dax
2 months
prior to cheat engine there was some paid equivalent i found a pirated license and put it in and got an alert that said "illegal key, FBI has been notified" i was like 9 so i slammed the laptop shut and hid in my bed and it took me a month before i stopped being worried
@birdabo
sui dev ☄️
2 months
kids who use this tool back in the day are now game developers lol.
66
76
4K
@birdabo
sui dev ☄️
2 months
kids who use this tool back in the day are now game developers lol.
838
2K
23K
@Ar_Douillard
Arthur Douillard
2 months
My first paper just reached the 1000 citations bar 🎉 It's about Continual Learning, a topic more relevant than ever. Although that particular paper and setting are of little usefulness nowadays.
12
5
263
@VictorTaelin
Taelin
2 months
be like me, close your ears, resist the temptation, refuse to learn anything ML related, double down on a niche domain that nobody is exploring, and either become a legend or die into total obscurity that's the only way
@predict_addict
Valeriy M., PhD, MBA, CQF
2 months
LeCun is right, when enrolling into PhD program don’t work on what is a hype topic of today. It was true in 2015 for reinforcement learning, it is true in 2025 for LLMs. The topic of tomorrow won’t be the hype topic of today, find a promising niche tech and work on it instead.
50
67
2K