TheZachMueller Profile Banner
Zach Mueller Profile
Zach Mueller

@TheZachMueller

Followers
12K
Following
41K
Media
2K
Statuses
17K

Let's make billions of parameters go brr together https://t.co/43e2uzaS6x

Joined April 2016
Don't wanna be here? Send us removal request.
@TheZachMueller
Zach Mueller
7 hours
There are a ton of speakers in the distributed conference. In this thread I'll update individual comment chains with the list of speakers in each of the three tracks: Applied, Pretraining, and Distributed Techniques to help you find them all easier.
3
1
30
@TheZachMueller
Zach Mueller
2 minutes
(I do understand this has been wishy-washy. Finding market fit and what is reasonable for entry-level engineers has been a touch-and-go game, and this is the result of where I think it should fit well).
0
0
0
@TheZachMueller
Zach Mueller
9 minutes
What that means is all 35% discount codes are still valid. So if you follow me on socials, you can reserve your spot for <$1k. Come join today:
1
0
0
@TheZachMueller
Zach Mueller
9 minutes
As speakers are coming in, I'm backing down a hair. To make the course more accessible for everyone, and ensure there are as little barrier to entry as possible the sticker price will stay at 1500 until the course begins. I don't want cost to ever be a barrier to learning.
Tweet media one
1
0
1
@TheZachMueller
Zach Mueller
2 hours
Trying to will local K2 into existence
Tweet media one
1
0
3
@TheZachMueller
Zach Mueller
6 hours
@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ @FerdinandMom @lessw2020 @m_sirovatka Marc Sun of @huggingface accelerate will be chatting about how you can optimize inference for large distributed models efficiently, and what techniques help you get there today
Tweet media one
0
0
4
@TheZachMueller
Zach Mueller
6 hours
@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ @FerdinandMom @lessw2020 Matej Sirovatka of @huggingface on the accelerate team. Matej is going to introduce you to the concept of Expert Parallelism and how this speeds up training for gigantic mixture-of-expert models
Tweet media one
1
0
2
@TheZachMueller
Zach Mueller
6 hours
@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ @FerdinandMom Less Wright of @Meta and @PyTorch . Less is one of the rare few who can answer the question of "you have 1,000 GPUs to use for a day, figure out how to optimize training between them all". He's going to be chatting about one of said techniques, Async Tensor Parallelism
Tweet media one
1
0
1
@TheZachMueller
Zach Mueller
6 hours
@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ Ferdinand Mom of @huggingface . Ferdinand will be giving a lecture on how multi-dimensional parallelism helps give your training the boost it needs when training models, and how to optimize this topology to do that
Tweet media one
1
0
1
@TheZachMueller
Zach Mueller
6 hours
@huggingface @GuggerSylvain Wanchao Liang, formerly @Meta and @PyTorch, creator of DTensors and TorchTitan. Wanchao will help guide you though PyTorch's DTensor and how this abstraction layer makes implementing distributed training paradigms easier and faster
Tweet media one
1
0
1
@TheZachMueller
Zach Mueller
6 hours
Distributed Techniques:. Sylvain Gugger, the mind behind @huggingface accelerate. Sylvain will help introduce you to distributed training as a whole, and what the ZeRO algorithm does
Tweet media one
1
0
1
@TheZachMueller
Zach Mueller
7 hours
Daniel Han-Chen of the infamous @UnslothAI. Daniel will help guide you through common low-level speedups that can reduce your training time such as triton kernels and more
Tweet media one
0
0
1
@TheZachMueller
Zach Mueller
7 hours
Elie Bakouch from @huggingface on the pretraining team. Elie will help you understand how modern LLM's like DeepSeek and others are hyper-optimized for efficient training through techniques like MLA, MoE, and more
Tweet media one
1
0
1
@TheZachMueller
Zach Mueller
7 hours
Pretraining:. Phuc Nguyen (@xariusrke) from @huggingface on the nanotron team. As a world expert in FP8 training, the Practitioners Guide to FP8 will help you get the most FLOPs out of expensive hardware like H100's and B200's through using their low-precision training
Tweet media one
1
0
1
@TheZachMueller
Zach Mueller
7 hours
Prince Canuma, ML Research Engineer working with MLX. Prince will be discussing how MLX and Apple Silicon can enable cheap clusters of M-series macs to run ML workloads at a fraction of the cost
Tweet media one
0
1
3
@TheZachMueller
Zach Mueller
7 hours
Tunji Ruwase, Software Engineer at @Snowflake . Tunji will talk about Snowflake's latest accomplishment: Arctic Long Sequence Training, which makes multi-million context length training efficient and scaleable
Tweet media one
1
0
2
@TheZachMueller
Zach Mueller
7 hours
Sami Jaghouar, Research Engineer at @PrimeIntellect . Prime is one of the major players leading the charge in decentralized training. Sami will help teach you how doing so at a global scale is possible.
Tweet media one
1
0
3
@TheZachMueller
Zach Mueller
7 hours
Applied: . Robert Nishihara, Cofounder of Ray, Anyscale. As one of the minds behind Ray, Robert will help you scale training across thousands of GPUs seamlessly through their platform.
Tweet media one
1
0
1
@TheZachMueller
Zach Mueller
13 hours
RT @TheZachMueller: There will be more cohorts. There's simply too much to cover in one go as this industry and research changes. The knowl….
0
1
0
@TheZachMueller
Zach Mueller
21 hours
RT @willccbb: who’s got the best hackable single-file implementation of LoRA training?.
0
6
0