
William Brandon
@exists_forall
Followers
659
Following
7K
Media
44
Statuses
569
he/him • Trying to become compute-bound • PhD student at MIT CSAIL • Prev: CS & Math at UC Berkeley; ML Compilers at NVIDIA • Opinions my own
Cambridge, MA
Joined March 2019
Getting this implemented, tested, and written up quickly was a great team effort with Ani Nrusimha (@Ani_nlp), Kevin Qian (@skeqiqevian), Zack Ankner (@ZackAnkner), Tian Jin (@tjingrant), Zoey (Zhiye) Song, and my advisor Jonathan Ragan-Kelley (@jrk). Go follow them!. 9/9.
0
0
6
Additionally, thanks to @mcarbin for sponsoring the reading group ( where we came up with this idea in the first place!. 8/.
1
0
8
We hope anyone training long-context language models with dense attention considers using Striped Attention to speed up their system!. Big thanks to @haoliuhl for the original Ring Attention technique + code, and to @morph_labs, @mosaicml, and the Google TRC for compute. 7/.
1
0
4
We build on top of the very cool Ring Attention technique introduced last month by @haoliuhl et al. (. Ring Attention is an efficient algorithm for computing self-attention for huge sequences (>100k tokens) distributed across multiple GPUs (or TPUs). 2/.
arxiv.org
Transformers have emerged as the architecture of choice for many state-of-the-art AI models, showcasing exceptional performance across a wide range of AI applications. However, the memory demands...
1
0
8
RT @plt_amy: If any of this sounds like someone you'd want to have as a student, despite the cons, and getting around the lack of an underg….
0
30
0