Zach Mueller @TheZachMueller X Profile

Zach Mueller

@TheZachMueller

Followers

12K

Following

41K

Media

2K

Statuses

17K

Let's make billions of parameters go brr together https://t.co/43e2uzaS6x

Joined April 2016

Don't wanna be here? Send us removal request.

Zach Mueller

@TheZachMueller

7 hours

There are a ton of speakers in the distributed conference. In this thread I'll update individual comment chains with the list of speakers in each of the three tracks: Applied, Pretraining, and Distributed Techniques to help you find them all easier.

3

1

30

Zach Mueller

@TheZachMueller

2 minutes

(I do understand this has been wishy-washy. Finding market fit and what is reasonable for entry-level engineers has been a touch-and-go game, and this is the result of where I think it should fit well).

0

Zach Mueller

@TheZachMueller

9 minutes

What that means is all 35% discount codes are still valid. So if you follow me on socials, you can reserve your spot for <$1k. Come join today:

1

0

Zach Mueller

@TheZachMueller

9 minutes

As speakers are coming in, I'm backing down a hair. To make the course more accessible for everyone, and ensure there are as little barrier to entry as possible the sticker price will stay at 1500 until the course begins. I don't want cost to ever be a barrier to learning.

1

0

1

Zach Mueller

@TheZachMueller

2 hours

Trying to will local K2 into existence

1

0

3

Zach Mueller

@TheZachMueller

6 hours

@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ @FerdinandMom @lessw2020 @m_sirovatka Marc Sun of @huggingface accelerate will be chatting about how you can optimize inference for large distributed models efficiently, and what techniques help you get there today

0

4

Zach Mueller

@TheZachMueller

6 hours

@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ @FerdinandMom @lessw2020 Matej Sirovatka of @huggingface on the accelerate team. Matej is going to introduce you to the concept of Expert Parallelism and how this speeds up training for gigantic mixture-of-expert models

1

0

2

Zach Mueller

@TheZachMueller

6 hours

@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ @FerdinandMom Less Wright of @Meta and @PyTorch . Less is one of the rare few who can answer the question of "you have 1,000 GPUs to use for a day, figure out how to optimize training between them all". He's going to be chatting about one of said techniques, Async Tensor Parallelism

1

0

1

Zach Mueller

@TheZachMueller

6 hours

@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ Ferdinand Mom of @huggingface . Ferdinand will be giving a lecture on how multi-dimensional parallelism helps give your training the boost it needs when training models, and how to optimize this topology to do that

1

0

1

Zach Mueller

@TheZachMueller

6 hours

@huggingface @GuggerSylvain Wanchao Liang, formerly @Meta and @PyTorch, creator of DTensors and TorchTitan. Wanchao will help guide you though PyTorch's DTensor and how this abstraction layer makes implementing distributed training paradigms easier and faster

1

0

1

Zach Mueller

@TheZachMueller

6 hours

Distributed Techniques:. Sylvain Gugger, the mind behind @huggingface accelerate. Sylvain will help introduce you to distributed training as a whole, and what the ZeRO algorithm does

1

0

1

Zach Mueller

@TheZachMueller

7 hours

Daniel Han-Chen of the infamous @UnslothAI. Daniel will help guide you through common low-level speedups that can reduce your training time such as triton kernels and more

0

1

Zach Mueller

@TheZachMueller

7 hours

Elie Bakouch from @huggingface on the pretraining team. Elie will help you understand how modern LLM's like DeepSeek and others are hyper-optimized for efficient training through techniques like MLA, MoE, and more

1

0

1

Zach Mueller

@TheZachMueller

7 hours

Pretraining:. Phuc Nguyen (@xariusrke) from @huggingface on the nanotron team. As a world expert in FP8 training, the Practitioners Guide to FP8 will help you get the most FLOPs out of expensive hardware like H100's and B200's through using their low-precision training

1

0

1

Zach Mueller

@TheZachMueller

7 hours

Prince Canuma, ML Research Engineer working with MLX. Prince will be discussing how MLX and Apple Silicon can enable cheap clusters of M-series macs to run ML workloads at a fraction of the cost

0

1

3

Zach Mueller

@TheZachMueller

7 hours

Tunji Ruwase, Software Engineer at @Snowflake . Tunji will talk about Snowflake's latest accomplishment: Arctic Long Sequence Training, which makes multi-million context length training efficient and scaleable

1

0

2

Zach Mueller

@TheZachMueller

7 hours

Sami Jaghouar, Research Engineer at @PrimeIntellect . Prime is one of the major players leading the charge in decentralized training. Sami will help teach you how doing so at a global scale is possible.

1

0

3

Zach Mueller

@TheZachMueller

7 hours

Applied: . Robert Nishihara, Cofounder of Ray, Anyscale. As one of the minds behind Ray, Robert will help you scale training across thousands of GPUs seamlessly through their platform.

1

0

1

Zach Mueller

@TheZachMueller

13 hours

RT @TheZachMueller: There will be more cohorts. There's simply too much to cover in one go as this industry and research changes. The knowl….

0

1

0

Zach Mueller

@TheZachMueller

21 hours

RT @willccbb: who’s got the best hackable single-file implementation of LoRA training?.

0

6

0