Ferdinand Mom @FerdinandMom X Profile

Ferdinand Mom

@FerdinandMom

Followers

3K

Following

2K

Media

35

Statuses

553

Distributed & Decentralized training @HuggingFace

https://t.co/O1O6UOVrh6

France

Joined October 2013

Don't wanna be here? Send us removal request.

Ferdinand Mom

@FerdinandMom

1 year

Interested in 4D parallelism but feeling overwhelmed by Megatron-LM codebase? We are currently cooking something with @Haojun_Zhao14 and @xariusrke 😉 In the meantime, here is a self-contained script that implements Pipeline Parallelism (AFAB + 1F1B) in 200 LOC 🧵👇

11

43

234

Humanoids daily

@humanoidsdaily

4 days

🚨 Stealth Startup Alert A new player has quietly entered the humanoid race in Paris. 🇫🇷 "UMA" is building a founding team for a general-purpose robot with a unique focus: "stylized" movement and end-to-end AI. We dug into the job listings to decode their strategy.

humanoidsdaily.com

Job listings reveal a new French venture building "intelligent humanoid robots" with a focus on stylized movement and learning-based control, adding to the heating European robotics landscape.

7

16

100

David Louapre

@dlouapre

3 days

Introducing "The Eiffel Tower Llama"!🗼 Remember Golden Gate Claude? Unfortunately Anthropic's viral demo was shut down after 24h, and key technical details remained hidden. So we recreated it, uncovering key insights on steering LLMs using SAEs⚒️ Full blog post + live demo 👇

9

43

165

Rémi Ouazan

@remi_or_

4 days

Why are vLLM and transformers so damn fast? ⚡ Continuous batching. That's the secret sauce 🔥 Never heard of it? We just dropped a blog post building it up from first principles 🤗 See what happens inside the minds of the engineers pushing inference to the edge 🧠

4

32

190

Arthur Zucker

@art_zucker

4 days

Mega cool blog post about attention, masking and continuous batching! I think its a keystone for people who want to understand deeply current serving framework and how flash attention unlocks everything!

6

48

365

HGPU group

@hgpu

6 days

Iris: First-Class Multi-GPU Programming Experience in Triton #Triton #HIP #CUDA #Package https://t.co/tvX7hWpbje

hgpu.org

Multi-GPU programming traditionally requires developers to navigate complex trade-offs between performance and programmability. High-performance implementations typically rely on low-level HIP/CUDA…

1

9

72

Lysandre

@LysandreJik

16 days

We've just started a Diffusers "MVP" program to work more closely with open-source collaborators. We'd love to work with you into making Diffusers a better tool than it is right now; please come and help make diffusion more accessible! Many open issues! https://t.co/5JiLemtLR0

1

4

28

Thomas Wolf

@Thom_Wolf

1 month

We’ve cooked another one of these 200+ pages practical books on model training that we love to write. This time it’s on all pretraining and post-training recipes and how to do a training project hyper parameter exploration. Closing the trilogy of: 1. Building a pretraining

elie

@eliebakouch

1 month

Training LLMs end to end is hard. Very excited to share our new blog (book?) that cover the full pipeline: pre-training, post-training and infra. 200+ pages of what worked, what didn’t, and how to make it run reliably https://t.co/iN2JtWhn23

22

116

1K

Arthur Douillard

@Ar_Douillard

1 month

As a scifi-fi nerd, Starcloud is super exciting: https://t.co/Y7dPYz9ls2 but this applications sounds bullshit to me? Latency isn't going to take hours, and wildfires detection can wait 20s

1

3

18

Ferdinand Mom

@FerdinandMom

1 month

Good shit from Monarch. Previously, i'll have to ssh to every node and remote attach my debugguer to all node. Now no longer the case

1

0

7

Arthur Douillard

@Ar_Douillard

1 month

Non-distributed DiLoCo as a super lookahead: Kalluski et al. from @metaai released a study of using Nesterov on outer gradients: https://t.co/cYupiYyQru The algo that they nicknamed SNOO is basically with DiLoCo with M=1, meaning that every K steps, a delta is computed between

5

12

79

Guilherme Penedo

@gui_penedo

1 month

New dataset release: 🌐FineWiki This is an updated and better extracted version of Wikipedia, covering 325+ languages. Unlike the old dataset from 2023, we kept all the math content, tables, properly rendered templates, and extracted key facts. Examples and highlights below.

17

76

552

Hynek Kydlíček

@HKydlicek

1 month

We’re releasing the full FinePdfs source code — plus new datasets and models! 🚀 📚 Datasets: • OCR-Annotations — 1.6k PDFs labeled for OCR need • Gemma-LID-Annotation — 20k samples per language (annotated with Gemma3-27B) 🤖 Models: • XGB-OCR — OCR classifier for PDFs

5

73

446

will brown

@willccbb

2 months

we're making our own chips btw. targeting Q1 2026

36

11

697

Pablo Montalvo

@m_olbap

2 months

Super excited to finally post this interactive resource! We maintain 1M+ Python LOC across 400+ model architectures in 🤗 Transformers. How do we keep it controlled and keep shipping models? With @LysandreJik, @pcuenq and @yonigoz we wrote down what makes it possible. Dive here!

5

23

85

Eustache Le Bihan

@eustachelb

2 months

Cool release by @LiquidAI_: LFM2-Audio-1.5B It’s a pretty cool omni-architecture that enables prediction of both text and audio tokens, meaning it can handle multi-turn S2S, ASR, and TTS (with voice description) within a single model. Great to see, once again this year, a model

2

35

164

dax

@thdxr

2 months

prior to cheat engine there was some paid equivalent i found a pirated license and put it in and got an alert that said "illegal key, FBI has been notified" i was like 9 so i slammed the laptop shut and hid in my bed and it took me a month before i stopped being worried

sui dev ☄️

@birdabo

2 months

kids who use this tool back in the day are now game developers lol.

66

76

4K

sui dev ☄️

@birdabo

2 months

kids who use this tool back in the day are now game developers lol.

838

2K

23K

Arthur Douillard

@Ar_Douillard

2 months

My first paper just reached the 1000 citations bar 🎉 It's about Continual Learning, a topic more relevant than ever. Although that particular paper and setting are of little usefulness nowadays.

12

5

263

Taelin

@VictorTaelin

2 months

be like me, close your ears, resist the temptation, refuse to learn anything ML related, double down on a niche domain that nobody is exploring, and either become a legend or die into total obscurity that's the only way

Valeriy M., PhD, MBA, CQF

@predict_addict

2 months

LeCun is right, when enrolling into PhD program don’t work on what is a hype topic of today. It was true in 2015 for reinforcement learning, it is true in 2025 for LLMs. The topic of tomorrow won’t be the hype topic of today, find a promising niche tech and work on it instead.

50

67

2K