
Songlin Yang
@SonglinYang4
Followers
12K
Following
5K
Media
75
Statuses
2K
Ph.D. student @MIT_CSAIL. Working on scalable and principled algorithms in #LLM and #MLSys. In open-sourcing I trust 🐳. she/her/hers
Cambridge, MA
Joined January 2021
📢 (1/16) Introducing PaTH 🛣️ — a RoPE-free contextualized position encoding scheme, built for stronger state tracking, better extrapolation, and hardware-efficient training. PaTH outperforms RoPE across short and long language modeling benchmarks.
arxiv.org
The attention mechanism is a core primitive in modern large language models (LLMs) and AI more broadly. Since attention by itself is permutation-invariant, position encoding is essential for...
9
91
538
RT @Guangxuan_Xiao: Just wrote a post on my understanding of the statistics behind block sparse attention. My take is that it works by us….
guangxuanx.com
How can a language model comprehend a million-token document without drowning in O(N²) attention cost? A statistical model revealing the success of block sparse attention through learned similarity...
0
15
0
RT @cHHillee: @JingyuanLiu123 This is the advantage of large nvlink domains or TPUs topology - the main reason to do PP is that you are bot….
0
5
0
Got something cool on reasoning? Submit to the Efficient Reasoning Workshop 🤗.
🌟 Reminder: Submission Deadline Approaching! 🌟. The 1st Workshop on Efficient Reasoning (ER) @ NeurIPS 2025 — happening Dec 6 or 7 in San Diego — is fast approaching, and we’d love to see your work there!. 📌 Submission Deadline: September 1, 2025 (AoE).🔗 Submit here:.
0
3
30
RT @stuart_sul: MoE layers can be really slow. When training our coding models @cursor_ai, they ate up 27–53% of training time. So we comp….
0
95
0
RT @arcprize: Analyzing the Hierarchical Reasoning Model by @makingAGI. We verified scores on hidden tasks, ran ablations, and found that p….
0
199
0
RT @ChangJonathanC: while we wait for gpt-5 to drop. Here is a flex attention tutorial for building a < 1000 LoC vllm from scratch. https://….
jonathanc.net
PyTorch FlexAttention tutorial: Building a minimal vLLM-style inference engine from scratch with paged attention
0
37
0
RT @gu_xiangming: I noticed that @OpenAI added learnable bias to attention logits before softmax. After softmax, they deleted the bias. Thi….
0
174
0
RT @abk_tau: Had a great time presenting OPRM at ASAP!. We talked about recurrent memory overflows, Long Context vs. RAG, and possible scal….
0
3
0
RT @SimonXinDong: Here is an explanation and implementation of the possible OpenAI used .Sliding Window 128 + Sink Tokens .with Flex Attent….
github.com
Contribute to XinDongol/SWA-SinkMeta development by creating an account on GitHub.
0
14
0
Falcon-H1 is very rich in content — highly recommended.
Falcon-H1 is a very dense research paper exploring the space of hybrid attention designs and tuning *every* hyperparameter there is. It's more interesting than models themselves. If you were intrigued by that «AlphaGo move» slop, this is the real thing.
2
1
81
RT @yzhang_cs: Huge congrats to the NSA team for their ACL2025 Best Paper win! 🏆🏆🏆.We've open-sourced a 3rdparty impl to help you integrate….
github.com
🚀 Efficient implementations of state-of-the-art linear attention models - fla-org/flash-linear-attention
0
18
0
RT @aryaman2020: nerdsniped while reading the DeltaNet paper: the main representation rewrite function we propose in ReFT is super similar….
0
2
0
RT @jacobmbuckman: New post: "Context Is More Than A Length-Measuring Contest". It is a mistake to take the context lengths reported by mod….
0
2
0