Siddarth Mamidanna @siddarthpm1 X Profile

Siddarth Mamidanna

@siddarthpm1

Followers

60

Following

2K

Media

9

Statuses

35

LLM and Mechanistic Interpretability Research @ UCSC | CS undergrad @BaskinEng | Applying to interpretability/safety PhD programs for Fall 2026

https://t.co/eVdTv82JiD

Santa Cruz

Joined September 2023

Don't wanna be here? Send us removal request.

2ᑭEᗩT ✍🏾

@2PeatAx

17 days

How could I possibly forget the game where we opted not to hand Florida their 4th consecutive double digit lose to Georgia? With or without a coach, you can’t beat us.

twinny

@JeremiahCrum3

17 days

@2PeatAx not with our talent lmao u forgot bout the game a few weeks ago with no head coach ?

28

16

806

Siddarth Mamidanna

@siddarthpm1

2 months

Excited to be in Suzhou for #EMNLP2025! I’m presenting our main conference paper showing how LLMs push computation into the last token’s residual stream ( https://t.co/rv2R6nr6cN). If you work on interpretability/alignment and want to chat (I’m applying for PhD positions starting

arxiv.org

Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and...

0

1

5

Ziyu Yao (Hiring Fall'26 PhDs)

@ZiyuYao

3 months

🎉Check out our recent papers accepted to #NeurIPS and #EMNLP on #MechInterp of LLMs (I'm hiring Fall'26 PhDs on this topic) #NeurIPS2025 Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

3

13

115

Yilun Zhou

@YilunZhou

3 months

How do LLMs perform direct math calculations? Check out our new #EMNLP2025 mechanistic interpretability paper led by @siddarthpm1 where we propose and validate a novel transformer circuit that captures the essence of this operation (spoiler alert: it works nothing like a human).

Siddarth Mamidanna

@siddarthpm1

3 months

🚨New EMNLP 2025 Paper: When a human does mental math like 12+45-8, we tend to do it stepwise: first compute 12+45=57, then 57-8=49. Does an LLM do the same? Turns out it doesn’t. But how does it work? Our paper investigates exactly this! 🧵(1/10) Paper: https://t.co/cvLr8Z2Oew

0

1

Siddarth Mamidanna

@siddarthpm1

3 months

Thanks for reading this far! If you found this interesting, be sure to check out the full paper and the code, and feel free to contact me with any questions or clarifications. A huge thanks to @YilunZhou, @ZiyuYao, and @DakingRai for the extensive guidance and helping me to my

arxiv.org

Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and...

0

Siddarth Mamidanna

@siddarthpm1

3 months

We also performed a series of further experiments investigating the exact responsible attention heads in the key layers. The figure below shows attention patterns in 3 of the 5 key attention heads we identified in the transfer layers, each of which allows the last token to attend

1

0

Siddarth Mamidanna

@siddarthpm1

3 months

We verify this result across a number of arithmetic tasks as well as other models (Pythia, GPT-J) and find it generalizes. The clean AF1 circuit found in Llama models attains high faithfulness on arithmetic tasks, while the weaker Pythia and GPT-J models need a longer information

1

0

Siddarth Mamidanna

@siddarthpm1

3 months

We call this circuit AF1 (“All for One”), because after a long “wait”, all tokens pass their info for one final token to do the entire computation. And while (15, 2) is the minimal configuration, any L_wait and L_transfer fulfilling the above conditions work! The below grid

1

0

Siddarth Mamidanna

@siddarthpm1

3 months

Finally, we progressively replace the full-peeking on the last token with self-peeking, thus constraining the information transfer to the last token – leaving us with only L_transfer layers to transfer information to the last token. In Llama-3-8B and Llama-3.1-8B, we find L_wait

1

0

Siddarth Mamidanna

@siddarthpm1

3 months

We replace the first L_wait layers with their CAMA representations, and find the model still computes well. After that, we let the last token attend to all previous tokens (full-peeking) but only allow the earlier tokens to attend to themselves in L_transfer layers. This forces

1

0

1

Siddarth Mamidanna

@siddarthpm1

3 months

To define "wait" in this context, we introduce CAMA (Context-Aware Mean Ablation). CAMA replaces a token’s (in this example, token “7”) hidden state with the average representation it would have if that token remains the same but the rest of the context varies. This preserves

1

0

Siddarth Mamidanna

@siddarthpm1

3 months

The AF1 subgraph is responsible for a wide majority of mental math computation in LLMs, where "mental math" is defined as direct math calculation via next-token prediction without explicit reasoning. This subgraph is characterized by the model first "waiting" without performing

1

0

Siddarth Mamidanna

@siddarthpm1

3 months

Counter-intuitively, rather than doing step-wise compositional calculation through the layers (e.g., first ten layers handling 12+45 and remaining layers doing 57-8), the model transfers the information from all tokens to the last token at select few layers, and carries out the

1

0

Siddarth Mamidanna

@siddarthpm1

3 months

🚨New EMNLP 2025 Paper: When a human does mental math like 12+45-8, we tend to do it stepwise: first compute 12+45=57, then 57-8=49. Does an LLM do the same? Turns out it doesn’t. But how does it work? Our paper investigates exactly this! 🧵(1/10) Paper: https://t.co/cvLr8Z2Oew

1

6

13

Yilun Zhou

@YilunZhou

3 months

Thanks @rohanpaul_ai for featuring our EMNLP 2025 paper! Super-proud of the work, led by @siddarthpm1, undergrad (read: PhD applicant very soon) from UCSC! In short, we uncovered a quite surprising mechanism of LLM solving arithmetic, but stay tuned for our own explainer thread!

Rohan Paul

@rohanpaul_ai

3 months

When a language model solves a math problem in its head, where in the network is the real calculation happening? This paper finds that almost all the actual math gets done right at the very last token of the sequence, not spread out across all the tokens. The earlier tokens

0

4

9

Siddarth Mamidanna

@siddarthpm1

3 months

This is me! Our own tweet is coming out in a couple of days, stay tuned🙂

Rohan Paul

@rohanpaul_ai

3 months

When a language model solves a math problem in its head, where in the network is the real calculation happening? This paper finds that almost all the actual math gets done right at the very last token of the sequence, not spread out across all the tokens. The earlier tokens

0

2

Siddarth Mamidanna

@siddarthpm1

4 months

I'll be here to present a poster this Friday. Please feel free to reach out; I would love to connect or just chat about interpretability!

David Bau

@davidbau

4 months

This Friday NEMI 2025 is at Northeastern in Boston, 8 talks, 24 roundtables, 90 posters; 200+ attendees. Thanks to @GoodfireAI for sponsoring! https://t.co/gfSHY9qamy If you can't make it in person, the livestream will be here: https://t.co/bMBjuG6DTe

0

1

3

neuronpedia

@neuronpedia

5 months

Today, we're releasing The Circuit Analysis Research Landscape: an interpretability post extending & open sourcing Anthropic's circuit tracing work, co-authored by @Anthropic, @GoogleDeepMind, @GoodfireAI @AiEleuther, and @decode_research. Here's a quick demo, details follow: ⤵️

7

68

327

Siddarth Mamidanna

@siddarthpm1

5 months

100 citations on my first paper!

0

2

David Pfau

@pfau

5 months

In a couple of years, no one will say they seriously thought you could get to AGI just by scaling up 2023 LLMs, but that was basically the consensus view for a certain crowd for about a year and a half.

Amir Efrati

@amir

5 months

GPT-5 is good. But model performance gains are still slower than in past years and this year has been a technically challenging one for OpenAI researchers. The inside story here...

98

139

2K