
Zach Mueller
@TheZachMueller
Followers
12K
Following
41K
Media
2K
Statuses
17K
Let's make billions of parameters go brr together https://t.co/43e2uzaS6x
Joined April 2016
@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ @FerdinandMom @lessw2020 @m_sirovatka Marc Sun of @huggingface accelerate will be chatting about how you can optimize inference for large distributed models efficiently, and what techniques help you get there today
0
0
4
@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ @FerdinandMom @lessw2020 Matej Sirovatka of @huggingface on the accelerate team. Matej is going to introduce you to the concept of Expert Parallelism and how this speeds up training for gigantic mixture-of-expert models
1
0
2
@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ @FerdinandMom Less Wright of @Meta and @PyTorch . Less is one of the rare few who can answer the question of "you have 1,000 GPUs to use for a day, figure out how to optimize training between them all". He's going to be chatting about one of said techniques, Async Tensor Parallelism
1
0
1
@huggingface @GuggerSylvain @Meta @PyTorch @wanchao_ Ferdinand Mom of @huggingface . Ferdinand will be giving a lecture on how multi-dimensional parallelism helps give your training the boost it needs when training models, and how to optimize this topology to do that
1
0
1
@huggingface @GuggerSylvain Wanchao Liang, formerly @Meta and @PyTorch, creator of DTensors and TorchTitan. Wanchao will help guide you though PyTorch's DTensor and how this abstraction layer makes implementing distributed training paradigms easier and faster
1
0
1
Distributed Techniques:. Sylvain Gugger, the mind behind @huggingface accelerate. Sylvain will help introduce you to distributed training as a whole, and what the ZeRO algorithm does
1
0
1
Daniel Han-Chen of the infamous @UnslothAI. Daniel will help guide you through common low-level speedups that can reduce your training time such as triton kernels and more
0
0
1
Elie Bakouch from @huggingface on the pretraining team. Elie will help you understand how modern LLM's like DeepSeek and others are hyper-optimized for efficient training through techniques like MLA, MoE, and more
1
0
1
Pretraining:. Phuc Nguyen (@xariusrke) from @huggingface on the nanotron team. As a world expert in FP8 training, the Practitioners Guide to FP8 will help you get the most FLOPs out of expensive hardware like H100's and B200's through using their low-precision training
1
0
1
Tunji Ruwase, Software Engineer at @Snowflake . Tunji will talk about Snowflake's latest accomplishment: Arctic Long Sequence Training, which makes multi-million context length training efficient and scaleable
1
0
2
Sami Jaghouar, Research Engineer at @PrimeIntellect . Prime is one of the major players leading the charge in decentralized training. Sami will help teach you how doing so at a global scale is possible.
1
0
3
RT @TheZachMueller: There will be more cohorts. There's simply too much to cover in one go as this industry and research changes. The knowl….
0
1
0