Songlin Yang Profile
Songlin Yang

@SonglinYang4

Followers
12K
Following
5K
Media
70
Statuses
2K

Ph.D. student @MIT_CSAIL. Working on scalable and principled methods in #ML & #LLM. In open-sourcing I trust 🐳. she/her/hers

Cambridge, MA
Joined January 2021
Don't wanna be here? Send us removal request.
@SonglinYang4
Songlin Yang
1 month
📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks.
9
88
495
@SonglinYang4
Songlin Yang
3 days
To those in charge at Meta: give Allen more GPUs, and he will change the world.
@ZeyuanAllenZhu
Zeyuan Allen-Zhu, Sc.D.
3 days
No matter how AI evolves overnight—tech, career, how it may impact me—I remain committed to using "physics of language models" approach to predict next-gen AI. Due to my limited GPU access at Meta, Part 4.1 (+new 4.2) are still in progress, but results on Canon layers are shining
Tweet media one
0
13
196
@SonglinYang4
Songlin Yang
3 days
RT @lmxyy1999: 🚀 Meet #RadialAttention — a static sparse attention mechanism with O(nlogn) complexity for long video generation!.✅ Plug-and….
0
29
0
@SonglinYang4
Songlin Yang
3 days
Tweet media one
4
10
103
@SonglinYang4
Songlin Yang
7 days
RT @gui_penedo: We have finally released the 📝paper for 🥂FineWeb2, our large multilingual pre-training dataset. Along with general (and ex….
0
92
0
@SonglinYang4
Songlin Yang
8 days
RT @YouJiacheng: I hope he can read paper published by facebookresearch😂.
0
1
0
@SonglinYang4
Songlin Yang
8 days
RT @gallabytes: as test time training mechanisms mature, we're going to need continual learning benchmarks. I think the most obvious one is….
0
2
0
@SonglinYang4
Songlin Yang
8 days
RT @teortaxesTex: Incredible research, not only do they show mechanistic reasons for Whale's NSA and Kimi's MoBA's greater capacity for len….
0
23
0
@SonglinYang4
Songlin Yang
8 days
true.
@YouJiacheng
You Jiacheng
9 days
Dudes, it seems that Zuck didn't and doesn't know they have talents like Allen-Zhu.
0
0
30
@SonglinYang4
Songlin Yang
9 days
RT @arankomatsuzaki: I'd like to see Meta building a lean LLM team around Narang, Allen-Zhu, Mike Lewis, Zettlemoyer and Sukhbaatar and giv….
0
9
0
@SonglinYang4
Songlin Yang
9 days
RT @tilderesearch: Sparse attention (MoBA/NSA) trains faster & beats full attention in key tasks. But we’ve had no idea how they truly work….
0
80
0
@SonglinYang4
Songlin Yang
10 days
RT @nsaphra: 🚨 New preprint! 🚨 Phase transitions! We love to see them during LM training. Syntactic attention structure, induction heads, g….
0
43
0
@SonglinYang4
Songlin Yang
10 days
Recordings:
@SonglinYang4
Songlin Yang
11 days
@oswaldjoh and @ninoscherrer will present MesaNet at the ASAP seminar on Tuesday, June 24 at 2 PM ET!. MesaNet is a locally optimal test-time training (TTT) layer that optimizes the key-value reconstruction objective over the entire history. If you're into TTT, don't miss it!
Tweet media one
1
12
65
@SonglinYang4
Songlin Yang
10 days
RT @realDanFu: What a throwback to weak supervision! Great work @JonSaadFalcon @ekellbuch @MayeeChen!.
0
7
0
@SonglinYang4
Songlin Yang
10 days
RT @ziqiao_ma: Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any t….
0
37
0
@SonglinYang4
Songlin Yang
10 days
starting now!!.
@SonglinYang4
Songlin Yang
11 days
@oswaldjoh and @ninoscherrer will present MesaNet at the ASAP seminar on Tuesday, June 24 at 2 PM ET!. MesaNet is a locally optimal test-time training (TTT) layer that optimizes the key-value reconstruction objective over the entire history. If you're into TTT, don't miss it!
Tweet media one
0
1
11
@SonglinYang4
Songlin Yang
11 days
@oswaldjoh
Johannes Oswald
18 days
Super happy and proud to share our novel scalable RNN model - the MesaNet! . This work builds upon beautiful ideas of 𝗹𝗼𝗰𝗮𝗹𝗹𝘆 𝗼𝗽𝘁𝗶𝗺𝗮𝗹 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 (TTT), and combines ideas of in-context learning, test-time training and mesa-optimization.
Tweet media one
0
0
2
@SonglinYang4
Songlin Yang
11 days
Thanks for the contribution! The CI server makes it much easier for us to catch bugs quickly.
@Bitdeer_AI
Bitdeer AI
12 days
🎉 Another exciting news! We provide the Flash Linear Attention (#FLA) kernel with our CI server resources - the backbone behind that’s now powering next-generation long-context LLMs without KV cache. 📈 Community traction.· 2.7k ⭐ | 196 forks | 48 contributors on GitHub.·
Tweet media one
1
0
32
@SonglinYang4
Songlin Yang
11 days
RT @_marcsun: 🚀 SGLang now supports Hugging Face Transformers as a backend!. Run any transformers-compatible model with fast, production-gr….
0
24
0
@SonglinYang4
Songlin Yang
12 days
RT @justintchiu: cant miss any interviews with danny tarlow:
0
2
0
@SonglinYang4
Songlin Yang
13 days
RT @henryHM_ko: I wrote a new blog on TPUs -- it's been fun seeing how different they are from GPUs and also drawing things on excalidraw a….
0
184
0