Matej Sirovatka Profile
Matej Sirovatka

@m_sirovatka

Followers
714
Following
550
Media
38
Statuses
398

distributed connoisseur

Joined August 2021
Don't wanna be here? Send us removal request.
@m_sirovatka
Matej Sirovatka
11 days
We back at it with 🚀 After a huge success of the first @AMD x @GPU_MODE competition, we're back with another one! This time, a theme close to my heart! DISTRIBUTED KERNELS 🔥.🧵
Tweet media one
2
8
60
@m_sirovatka
Matej Sirovatka
16 hours
well I had loss in bf16 for whatever reason lol.
@m_sirovatka
Matej Sirovatka
17 hours
I think hyper-params won't help me with this loss curve (I hate gradient accumulation).
Tweet media one
0
0
6
@m_sirovatka
Matej Sirovatka
17 hours
I think hyper-params won't help me with this loss curve (I hate gradient accumulation).
Tweet media one
@m_sirovatka
Matej Sirovatka
17 hours
What you can't build you don't understand. Well apparently I can't build a toy pre-training framework? .I need a refresher on current pre-training trends, any good papers? (looking at you @eliebakouch ).
1
1
9
@m_sirovatka
Matej Sirovatka
17 hours
What you can't build you don't understand. Well apparently I can't build a toy pre-training framework? .I need a refresher on current pre-training trends, any good papers? (looking at you @eliebakouch ).
1
0
15
@m_sirovatka
Matej Sirovatka
2 days
learning cutedsl before losing your mind over complement in layout algebra.
@aryanvs_
Aryan V S
2 days
learning triton before reading pmpp
Tweet media one
4
0
27
@m_sirovatka
Matej Sirovatka
6 days
Go check out the newest leaderboard at @GPU_MODE
Tweet media one
Tweet media two
1
1
24
@m_sirovatka
Matej Sirovatka
6 days
You can just ask for things. Love Google for this 💜
Tweet media one
@m_sirovatka
Matej Sirovatka
6 days
A question to my hw rich friends, I'm currently gpu rich-ish, how can I become TPU rich-ish (I hate ssh-ing to a collab instance). I just need a little, 4/8 TPUs to run funny stuff at.
6
6
163
@m_sirovatka
Matej Sirovatka
6 days
A question to my hw rich friends, I'm currently gpu rich-ish, how can I become TPU rich-ish (I hate ssh-ing to a collab instance). I just need a little, 4/8 TPUs to run funny stuff at.
7
0
42
@m_sirovatka
Matej Sirovatka
8 days
You have to wait for the best. Incredibly honoured to be a part of this course, together with this bunch of cool people.
@TheZachMueller
Zach Mueller
8 days
Day 14 of 14 Days of Distributed!. We've got a number of cool people still that are talking since we started this list, so today we're going to rapid fire them all (in no particular order)! Let's buckle up and go!. @winglian @FerdinandMom @m_sirovatka @mervenoyann @charles_irl
Tweet media one
1
0
7
@m_sirovatka
Matej Sirovatka
8 days
Holidays going great, exactly 1 day without work. Btw tune in on @GPU_MODE for a talk about PCCL from @PrimeIntellect in 30min
Tweet media one
1
1
82
@m_sirovatka
Matej Sirovatka
10 days
RT @AIatAMD: Calling all GPU & AI developers, it’s go time!. Join the AMD Developer Challenge 2025! Optimize multi-GPU kernels, win prizes….
0
10
0
@m_sirovatka
Matej Sirovatka
10 days
RT @_marcsun: Happy to participate in the online course by my mentor @TheZachMueller ! The topic of my talk will be efficient distributed i….
0
5
0
@m_sirovatka
Matej Sirovatka
11 days
something’s cooking.
@sid_srk
Sid
11 days
Oct 17 at Toronto School of Foundation Modelling:.@m_sirovatka will talk about model sharding, network topologies of large-scale clusters and how these pieces connect.
0
2
15
@m_sirovatka
Matej Sirovatka
11 days
The competition runs for 6 weeks, starting August 30th, after which AMD will fly the winners for a celebration in US! As per usual, the grand price is 100k$ 💰, with smaller prices for other top contestants 👀.Register here rn!.
Tweet card summary image
amdchallenge2025.datamonsters.com
In this challenge sponsored by Advanced Micro Devices, Inc. (“AMD”), participants are invited to form up to a 3-member team to develop and optimize low-level kernels and deliver significant perform...
0
1
4
@m_sirovatka
Matej Sirovatka
11 days
After 2 weeks, we're taking a detour to tensor parallelism, optimising GEMM + Reduce Scatter, and finishing up with All-Gather + GEMM, covering the most common parallelisms in large model training and inference 📈 All on a full node of MI300s 🐳.
1
1
5
@m_sirovatka
Matej Sirovatka
11 days
We are gonna give you FULL 8xMI300 node, all for free, to write the fastest kernels! Competition is gonna last for 6 weeks, with a problem being released every 2 weeks. We're starting August 30th, with ALL2ALL Dispatch + Combine across 8 GPUs, to make MOEs brrr ⚡️.
1
0
3
@m_sirovatka
Matej Sirovatka
11 days
RT @a1zhang: Excited to announce the SECOND @GPU_MODE x @AMD $100K kernel competition: ⚡️DISTRIBUTED KERNELS!!. You now get free access to….
0
26
0
@m_sirovatka
Matej Sirovatka
12 days
We fully integrated N-D Parallelism into Trainer, supporting any configuration you might like, including FSDP, tensor parallel and so on 📈.You can find a full example on how to use this in the accelerate repository.
Tweet card summary image
github.com
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed...
1
0
10
@m_sirovatka
Matej Sirovatka
12 days
Context parallelism in 🤗 transformers Trainer?.Training models on 100k+ sequence length has never been easier 🚀
Tweet media one
3
17
129