Arjun Devraj Profile
Arjun Devraj

@arjun_devraj_

Followers
26
Following
221
Media
0
Statuses
8

PhD student @cornell_cs. Previously: SWE @meta, undergrad @princeton

Joined September 2023
Don't wanna be here? Send us removal request.
@arjun_devraj_
Arjun Devraj
7 months
⭐️ On an 8-GPU NVSwitched server, StragglAR speeds up AllReduce for larger buffers by 22% over Ring, a result that should only improve with more GPUs. StragglAR also reliably reduces end-to-end iteration time during data-parallel finetuning of Llama-3.2-3B (see our paper)!
0
0
2
@arjun_devraj_
Arjun Devraj
7 months
📈 StragglAR’s performance advantage *increases* as the GPU cluster size scales, and it asymptotically achieves the lowest known bandwidth cost among all algorithms under straggler conditions! We do this by reformulating AllReduce as an efficient broadcast with n-2+log n rounds.
1
0
2
@arjun_devraj_
Arjun Devraj
7 months
💡Once the straggler reaches the synchronization barrier, StragglAR implements a fast, novel collective algorithm to complete the AllReduce. When the initial ReduceScatter is fully overlapped with the straggler delay, this results in (provably) 2x lower communication cost!
1
0
2
@arjun_devraj_
Arjun Devraj
7 months
🏁 Instead of allowing other GPUs to idle (while waiting for the straggler) before starting the AllReduce, our algorithm—StragglAR—uses the delay to perform useful communication. With StragglAR, non-straggler GPUs complete a ReduceScatter while waiting for the straggler.
1
0
2
@arjun_devraj_
Arjun Devraj
7 months
🐌 Persistent straggler GPUs delay AllReduce. Distributed ML jobs that use data or tensor parallelism are bottlenecked by AllReduce to communicate gradients/activations in training and inference. We find a *persistent* straggler delays AllReduce in multi-GPU training experiments.
1
0
2
@arjun_devraj_
Arjun Devraj
7 months
Excited to share our preprint: Accelerating AllReduce with a Persistent Straggler 🚀 w/ Eric Ding, @nth_abhishek, Robert Kleinberg, @rachee_singh We design a new algorithm to speed up AllReduce in distributed ML jobs with a persistent straggler GPU 🧵⬇️ https://t.co/VaGaLDCDvY
1
3
12
@rachee_singh
Rachee Singh
1 year
@nth_abhishek presents our work on server-scale photonic interconnects at @ACMSIGCOMM HotNets! Thanks Sujata for chairing the session!
0
4
16
@rachee_singh
Rachee Singh
1 year
I am hiring a postdoc at Cornell for systems research on next-generation multi-GPU interconnects. If you are about to graduate with a PhD in CS or a related field, email me at rachee@cs.cornell.edu with your CV and a representative publication.
1
35
113