Yash Sarrof @yashYRS X Profile

Yash Sarrof

@yashYRS

Followers

137

Following

356

Media

6

Statuses

117

PhD student, Saarland University, advised by Michael Hahn, lab website : https://t.co/iSGrDEoxG6

https://t.co/imuA54DdmG

Saarbrücken, Germany

Joined January 2015

Don't wanna be here? Send us removal request.

Yash Sarrof

@yashYRS

1 year

First-time NeurIPS attendee here! Super excited to talk about our paper with @yveitsman, @mhahn29 and to discover the amazing work by everyone else :D https://t.co/rt0peAA0SR

Yash Sarrof

@yashYRS

2 years

We are excited to share our work on characterizing the expressivity of State Space Models (SSMs) with a theoretical lens, using a formal language framework, backed up by empirical findings. w/ Yana Veitsman, Dr. Michael Hahn Paper link : https://t.co/PIoQpODS3Q

0

3

13

Mor Geva

@megamor2

3 days

Had great fun at @dagstuhl this week! 😍 Thanks for the invite and interesting discussions! @JiUngLee1 @WolfStammer @SoyoungOh5 @SwetaMahajan1 Aleksandra Bakalova I talked about parameter-grounded units for mech interp, covering our recent work on MAPS and SNMF:

1

2

34

Michael Hahn

@mhahn29

9 days

We’re hiring PhD students and postdocs on LLM theory and interpretability! Topics: 1️⃣ abilities & limitations of transformers and other architectures; 2️⃣ LLM interpretability; 3️⃣ foundations of LLM reasoning; 4️⃣ foundations of AI safety.

13

92

617

Michael Rizvi-Martel

@frisbeemortel

1 month

Is there such a thing as too many agents in multi-agent systems? It depends! 🧵 Our work reveals 3 distinct regimes where communication patterns differ dramatically. More on our findings below 👇 (1/7)

1

11

28

Yash Sarrof

@yashYRS

1 month

Got awarded Top Reviewer at NeurIPS 2025 @NeurIPSConf. Really gratifying to get this at my first attempt at Reviewing 😃

0

1

7

Andy J Yang

@pentagonalize

2 months

Read the cookbook: https://t.co/ymBPgfwGxa Join us for weekly seminars on formal language theory, ML, NLP, and more:

arxiv.org

We present the transformer cookbook: a collection of techniques for directly encoding algorithms into a transformer's parameters. This work addresses the steep learning curve of such endeavors, a...

0

5

24

Dylan Foster 🐢

@canondetortugas

4 months

Announcing the first workshop on Foundations of Language Model Reasoning (FoRLM) at NeurIPS 2025! 📝Soliciting abstracts that advance foundational understanding of reasoning in language models, from theoretical analyses to rigorous empirical studies. 📆 Deadline: Sept 3, 2025

1

27

162

Xinting Huang

@huangxt233

4 months

Do LLMs store information in interpretable subspaces -- similar to variables in a program? In our new paper, we decompose representation space into smaller, interpretable, non-basis-aligned subspaces with unsupervised learning.

2

46

348

David Chiang

@davidweichiang

5 months

New on arXiv: Knee-Deep in C-RASP, by @pentagonalize, Michael Cadilhac and me. The solid stepped line is our theoretical prediction based on what problems C-RASP can solve, and the numbers/colors are what transformers (no position embedding) can learn.

1

11

39

Yana Veitsman

@yveitsman

6 months

How do architectural limitations of Transformers manifest after pretraining?

2

5

13

Yuekun Yao

@yuekun_yao

6 months

Can language models learn implicit reasoning without chain-of-thought? Our new paper shows: Yes, LMs can learn k-hop reasoning; however, it comes at the cost of an exponential increase in training data and linear growth in model depth as k increases. https://t.co/1CGTRJGnGQ

1

2

7

Michael Hahn

@mhahn29

7 months

1/11 Solving tasks with Chain-of-Thought reasoning is incredibly effective, but generating long CoTs is pricey. So we want them to be as short as possible to solve the task. But wait – how short is that? In our new ICML paper, we aim to answer that question.

1

6

26

Yash Sarrof

@yashYRS

8 months

Fantastic thread by my colleague & dear friend @abakalova13175 diving into how LLMs perform in-context learning, based on her new preprint By tracing information flow, she reveals how LLMs dynamically handle examples via a "Contextualize, then Aggregate" process. Worth the read!

abakalova

@abakalova13175

8 months

How do LLMs perform in-context learning? LLMs can infer tasks from just a few examples in prompts - but how? Our new preprint proposes a two-step mechanism: first contextualize, then aggregate. 🧵

0

3

abakalova

@abakalova13175

8 months

How do LLMs perform in-context learning? LLMs can infer tasks from just a few examples in prompts - but how? Our new preprint proposes a two-step mechanism: first contextualize, then aggregate. 🧵

1

8

28

Marco🍞

@marco_computers

10 months

Today we are launching a server dedicated to Tokenization research! Come join us! https://t.co/Jor3G6VFcC

discord.com

Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

2

9

16

Riccardo Grazzi

@riccardograzzi

1 year

LLMs can now track states, finally matching this cat! And we prove it. But how? 🧵👇 1/ Paper: https://t.co/aKvrqYtkWh with @julien_siems @jkhfranke @ZelaArber @FrankRHutter @MPontil

2

17

57

Michael Hahn

@mhahn29

1 year

When do transformers length-generalize? Generalizing to sequences longer than seen during training is a key challenge for transformers. Some tasks see success, others fail — but *why*? We introduce a theoretical framework to understand and predict length generalization.

3

27

185

Michael Hahn

@mhahn29

1 year

Excited and honored to receive a Best Paper Award for our work on the inductive bias of the transformer architecture with @broccolitwit 🌟🎉 #ACL2024NLP

Mark Rofin

@broccolitwit

2 years

New paper accepted at ACL 2024 main track! Together with @mhahn29, we address the question: “Why are Sensitive Functions Hard for Transformers?” 1/n

16

18

197

Yash Sarrof

@yashYRS

2 years

Discover more such exciting projects at our lab, LaCoCo: https://t.co/3utwc5fDh5.

lacoco-lab.github.io

A highly-customizable Hugo academic resume theme powered by Wowchemy website builder.

0

6

Yash Sarrof

@yashYRS

2 years

For a given finite-state problem, one can decidable check whether it falls into this class. Such decidable characterizations may make it easier to theoretically predict abilities and anticipate failures of LLMs.

1

0

4