Yash Sarrof Profile
Yash Sarrof

@yashYRS

Followers
137
Following
356
Media
6
Statuses
117

PhD student, Saarland University, advised by Michael Hahn, lab website : https://t.co/iSGrDEoxG6

Saarbrücken, Germany
Joined January 2015
Don't wanna be here? Send us removal request.
@yashYRS
Yash Sarrof
1 year
First-time NeurIPS attendee here! Super excited to talk about our paper with @yveitsman, @mhahn29 and to discover the amazing work by everyone else :D https://t.co/rt0peAA0SR
@yashYRS
Yash Sarrof
2 years
We are excited to share our work on characterizing the expressivity of State Space Models (SSMs) with a theoretical lens, using a formal language framework, backed up by empirical findings. w/ Yana Veitsman, Dr. Michael Hahn Paper link : https://t.co/PIoQpODS3Q
0
3
13
@megamor2
Mor Geva
3 days
Had great fun at @dagstuhl this week! 😍 Thanks for the invite and interesting discussions! @JiUngLee1 @WolfStammer @SoyoungOh5 @SwetaMahajan1 Aleksandra Bakalova I talked about parameter-grounded units for mech interp, covering our recent work on MAPS and SNMF:
1
2
34
@mhahn29
Michael Hahn
9 days
We’re hiring PhD students and postdocs on LLM theory and interpretability! Topics: 1️⃣ abilities & limitations of transformers and other architectures; 2️⃣ LLM interpretability; 3️⃣ foundations of LLM reasoning; 4️⃣ foundations of AI safety.
13
92
617
@frisbeemortel
Michael Rizvi-Martel
1 month
Is there such a thing as too many agents in multi-agent systems? It depends! 🧵 Our work reveals 3 distinct regimes where communication patterns differ dramatically. More on our findings below 👇 (1/7)
1
11
28
@yashYRS
Yash Sarrof
1 month
Got awarded Top Reviewer at NeurIPS 2025 @NeurIPSConf. Really gratifying to get this at my first attempt at Reviewing 😃
0
1
7
@canondetortugas
Dylan Foster 🐢
4 months
Announcing the first workshop on Foundations of Language Model Reasoning (FoRLM) at NeurIPS 2025! 📝Soliciting abstracts that advance foundational understanding of reasoning in language models, from theoretical analyses to rigorous empirical studies. 📆 Deadline: Sept 3, 2025
1
27
162
@huangxt233
Xinting Huang
4 months
Do LLMs store information in interpretable subspaces -- similar to variables in a program? In our new paper, we decompose representation space into smaller, interpretable, non-basis-aligned subspaces with unsupervised learning.
2
46
348
@davidweichiang
David Chiang
5 months
New on arXiv: Knee-Deep in C-RASP, by @pentagonalize, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.
1
11
39
@yveitsman
Yana Veitsman
6 months
How do architectural limitations of Transformers manifest after pretraining?
2
5
13
@yuekun_yao
Yuekun Yao
6 months
Can language models learn implicit reasoning without chain-of-thought? Our new paper shows: Yes, LMs can learn k-hop reasoning; however, it comes at the cost of an exponential increase in training data and linear growth in model depth as k increases. https://t.co/1CGTRJGnGQ
1
2
7
@mhahn29
Michael Hahn
7 months
1/11 Solving tasks with Chain-of-Thought reasoning is incredibly effective, but generating long CoTs is pricey. So we want them to be as short as possible to solve the task. But wait – how short is that? In our new ICML paper, we aim to answer that question.
1
6
26
@yashYRS
Yash Sarrof
8 months
Fantastic thread by my colleague & dear friend @abakalova13175 diving into how LLMs perform in-context learning, based on her new preprint By tracing information flow, she reveals how LLMs dynamically handle examples via a "Contextualize, then Aggregate" process. Worth the read!
@abakalova13175
abakalova
8 months
How do LLMs perform in-context learning? LLMs can infer tasks from just a few examples in prompts - but how? Our new preprint proposes a two-step mechanism: first contextualize, then aggregate. 🧵
0
0
3
@abakalova13175
abakalova
8 months
How do LLMs perform in-context learning? LLMs can infer tasks from just a few examples in prompts - but how? Our new preprint proposes a two-step mechanism: first contextualize, then aggregate. 🧵
1
8
28
@riccardograzzi
Riccardo Grazzi
1 year
LLMs can now track states, finally matching this cat! And we prove it. But how? 🧵👇 1/ Paper: https://t.co/aKvrqYtkWh with @julien_siems @jkhfranke @ZelaArber  @FrankRHutter   @MPontil
2
17
57
@mhahn29
Michael Hahn
1 year
When do transformers length-generalize? Generalizing to sequences longer than seen during training is a key challenge for transformers. Some tasks see success, others fail — but *why*? We introduce a theoretical framework to understand and predict length generalization.
3
27
185
@mhahn29
Michael Hahn
1 year
Excited and honored to receive a Best Paper Award for our work on the inductive bias of the transformer architecture with @broccolitwit 🌟🎉 #ACL2024NLP
@broccolitwit
Mark Rofin
2 years
New paper accepted at ACL 2024 main track! Together with @mhahn29, we address the question: “Why are Sensitive Functions Hard for Transformers?” 1/n
16
18
197
@yashYRS
Yash Sarrof
2 years
For a given finite-state problem, one can decidable check whether it falls into this class. Such decidable characterizations may make it easier to theoretically predict abilities and anticipate failures of LLMs.
1
0
4