
Songlin Yang
@SonglinYang4
Followers
12K
Following
5K
Media
70
Statuses
2K
Ph.D. student @MIT_CSAIL. Working on scalable and principled methods in #ML & #LLM. In open-sourcing I trust 🐳. she/her/hers
Cambridge, MA
Joined January 2021
To those in charge at Meta: give Allen more GPUs, and he will change the world.
No matter how AI evolves overnight—tech, career, how it may impact me—I remain committed to using "physics of language models" approach to predict next-gen AI. Due to my limited GPU access at Meta, Part 4.1 (+new 4.2) are still in progress, but results on Canon layers are shining
0
13
196
RT @lmxyy1999: 🚀 Meet #RadialAttention — a static sparse attention mechanism with O(nlogn) complexity for long video generation!.✅ Plug-and….
0
29
0
RT @gui_penedo: We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset. Along with general (and ex….
0
92
0
RT @gallabytes: as test time training mechanisms mature, we're going to need continual learning benchmarks. I think the most obvious one is….
0
2
0
RT @teortaxesTex: Incredible research, not only do they show mechanistic reasons for Whale's NSA and Kimi's MoBA's greater capacity for len….
0
23
0
RT @arankomatsuzaki: I'd like to see Meta building a lean LLM team around Narang, Allen-Zhu, Mike Lewis, Zettlemoyer and Sukhbaatar and giv….
0
9
0
RT @tilderesearch: Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work….
0
80
0
RT @nsaphra: 🚨 New preprint! 🚨 Phase transitions! We love to see them during LM training. Syntactic attention structure, induction heads, g….
0
43
0
Recordings:
@oswaldjoh and @ninoscherrer will present MesaNet at the ASAP seminar on Tuesday, June 24 at 2 PM ET!. MesaNet is a locally optimal test-time training (TTT) layer that optimizes the key-value reconstruction objective over the entire history. If you're into TTT, don't miss it!
1
12
65
RT @realDanFu: What a throwback to weak supervision! Great work @JonSaadFalcon @ekellbuch @MayeeChen!.
0
7
0
RT @ziqiao_ma: Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any t….
0
37
0
starting now!!.
@oswaldjoh and @ninoscherrer will present MesaNet at the ASAP seminar on Tuesday, June 24 at 2 PM ET!. MesaNet is a locally optimal test-time training (TTT) layer that optimizes the key-value reconstruction objective over the entire history. If you're into TTT, don't miss it!
0
1
11
Thanks for the contribution! The CI server makes it much easier for us to catch bugs quickly.
🎉 Another exciting news! We provide the Flash Linear Attention (#FLA) kernel with our CI server resources - the backbone behind that’s now powering next-generation long-context LLMs without KV cache. 📈 Community traction.· 2.7k ⭐ | 196 forks | 48 contributors on GitHub.·
1
0
32
RT @_marcsun: 🚀 SGLang now supports Hugging Face Transformers as a backend!. Run any transformers-compatible model with fast, production-gr….
0
24
0
RT @henryHM_ko: I wrote a new blog on TPUs -- it's been fun seeing how different they are from GPUs and also drawing things on excalidraw a….
0
184
0