Yash Sarrof
@yashYRS
Followers
137
Following
356
Media
6
Statuses
117
PhD student, Saarland University, advised by Michael Hahn, lab website : https://t.co/iSGrDEoxG6
Saarbrücken, Germany
Joined January 2015
First-time NeurIPS attendee here! Super excited to talk about our paper with @yveitsman, @mhahn29 and to discover the amazing work by everyone else :D https://t.co/rt0peAA0SR
We are excited to share our work on characterizing the expressivity of State Space Models (SSMs) with a theoretical lens, using a formal language framework, backed up by empirical findings. w/ Yana Veitsman, Dr. Michael Hahn Paper link : https://t.co/PIoQpODS3Q
0
3
13
Had great fun at @dagstuhl this week! 😍 Thanks for the invite and interesting discussions! @JiUngLee1
@WolfStammer @SoyoungOh5 @SwetaMahajan1 Aleksandra Bakalova I talked about parameter-grounded units for mech interp, covering our recent work on MAPS and SNMF:
1
2
34
We’re hiring PhD students and postdocs on LLM theory and interpretability! Topics: 1️⃣ abilities & limitations of transformers and other architectures; 2️⃣ LLM interpretability; 3️⃣ foundations of LLM reasoning; 4️⃣ foundations of AI safety.
13
92
617
Is there such a thing as too many agents in multi-agent systems? It depends! 🧵 Our work reveals 3 distinct regimes where communication patterns differ dramatically. More on our findings below 👇 (1/7)
1
11
28
Got awarded Top Reviewer at NeurIPS 2025 @NeurIPSConf. Really gratifying to get this at my first attempt at Reviewing 😃
0
1
7
Read the cookbook: https://t.co/ymBPgfwGxa Join us for weekly seminars on formal language theory, ML, NLP, and more:
arxiv.org
We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a...
0
5
24
Announcing the first workshop on Foundations of Language Model Reasoning (FoRLM) at NeurIPS 2025! 📝Soliciting abstracts that advance foundational understanding of reasoning in language models, from theoretical analyses to rigorous empirical studies. 📆 Deadline: Sept 3, 2025
1
27
162
Do LLMs store information in interpretable subspaces -- similar to variables in a program? In our new paper, we decompose representation space into smaller, interpretable, non-basis-aligned subspaces with unsupervised learning.
2
46
348
New on arXiv: Knee-Deep in C-RASP, by @pentagonalize, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.
1
11
39
How do architectural limitations of Transformers manifest after pretraining?
2
5
13
Can language models learn implicit reasoning without chain-of-thought? Our new paper shows: Yes, LMs can learn k-hop reasoning; however, it comes at the cost of an exponential increase in training data and linear growth in model depth as k increases. https://t.co/1CGTRJGnGQ
1
2
7
1/11 Solving tasks with Chain-of-Thought reasoning is incredibly effective, but generating long CoTs is pricey. So we want them to be as short as possible to solve the task. But wait – how short is that? In our new ICML paper, we aim to answer that question.
1
6
26
Fantastic thread by my colleague & dear friend @abakalova13175 diving into how LLMs perform in-context learning, based on her new preprint By tracing information flow, she reveals how LLMs dynamically handle examples via a "Contextualize, then Aggregate" process. Worth the read!
How do LLMs perform in-context learning? LLMs can infer tasks from just a few examples in prompts - but how? Our new preprint proposes a two-step mechanism: first contextualize, then aggregate. 🧵
0
0
3
How do LLMs perform in-context learning? LLMs can infer tasks from just a few examples in prompts - but how? Our new preprint proposes a two-step mechanism: first contextualize, then aggregate. 🧵
1
8
28
Today we are launching a server dedicated to Tokenization research! Come join us! https://t.co/Jor3G6VFcC
discord.com
Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.
2
9
16
LLMs can now track states, finally matching this cat! And we prove it. But how? 🧵👇 1/ Paper: https://t.co/aKvrqYtkWh with @julien_siems @jkhfranke @ZelaArber @FrankRHutter @MPontil
2
17
57
When do transformers length-generalize? Generalizing to sequences longer than seen during training is a key challenge for transformers. Some tasks see success, others fail — but *why*? We introduce a theoretical framework to understand and predict length generalization.
3
27
185
Excited and honored to receive a Best Paper Award for our work on the inductive bias of the transformer architecture with @broccolitwit 🌟🎉 #ACL2024NLP
New paper accepted at ACL 2024 main track! Together with @mhahn29, we address the question: “Why are Sensitive Functions Hard for Transformers?” 1/n
16
18
197
Discover more such exciting projects at our lab, LaCoCo: https://t.co/3utwc5fDh5.
lacoco-lab.github.io
A highly-customizable Hugo academic resume theme powered by Wowchemy website builder.
0
0
6
For a given finite-state problem, one can decidable check whether it falls into this class. Such decidable characterizations may make it easier to theoretically predict abilities and anticipate failures of LLMs.
1
0
4