Siddarth Mamidanna
@siddarthpm1
Followers
60
Following
2K
Media
9
Statuses
35
LLM and Mechanistic Interpretability Research @ UCSC | CS undergrad @BaskinEng | Applying to interpretability/safety PhD programs for Fall 2026
Santa Cruz
Joined September 2023
How could I possibly forget the game where we opted not to hand Florida their 4th consecutive double digit lose to Georgia? With or without a coach, you can’t beat us.
@2PeatAx not with our talent lmao u forgot bout the game a few weeks ago with no head coach ?
28
16
806
Excited to be in Suzhou for #EMNLP2025! I’m presenting our main conference paper showing how LLMs push computation into the last token’s residual stream ( https://t.co/rv2R6nr6cN). If you work on interpretability/alignment and want to chat (I’m applying for PhD positions starting
arxiv.org
Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and...
0
1
5
🎉Check out our recent papers accepted to #NeurIPS and #EMNLP on #MechInterp of LLMs (I'm hiring Fall'26 PhDs on this topic) #NeurIPS2025 Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones
3
13
115
How do LLMs perform direct math calculations? Check out our new #EMNLP2025 mechanistic interpretability paper led by @siddarthpm1 where we propose and validate a novel transformer circuit that captures the essence of this operation (spoiler alert: it works nothing like a human).
🚨New EMNLP 2025 Paper: When a human does mental math like 12+45-8, we tend to do it stepwise: first compute 12+45=57, then 57-8=49. Does an LLM do the same? Turns out it doesn’t. But how does it work? Our paper investigates exactly this! 🧵(1/10) Paper: https://t.co/cvLr8Z2Oew
0
1
1
Thanks for reading this far! If you found this interesting, be sure to check out the full paper and the code, and feel free to contact me with any questions or clarifications. A huge thanks to @YilunZhou, @ZiyuYao, and @DakingRai for the extensive guidance and helping me to my
arxiv.org
Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and...
0
0
0
We also performed a series of further experiments investigating the exact responsible attention heads in the key layers. The figure below shows attention patterns in 3 of the 5 key attention heads we identified in the transfer layers, each of which allows the last token to attend
1
0
0
We verify this result across a number of arithmetic tasks as well as other models (Pythia, GPT-J) and find it generalizes. The clean AF1 circuit found in Llama models attains high faithfulness on arithmetic tasks, while the weaker Pythia and GPT-J models need a longer information
1
0
0
We call this circuit AF1 (“All for One”), because after a long “wait”, all tokens pass their info for one final token to do the entire computation. And while (15, 2) is the minimal configuration, any L_wait and L_transfer fulfilling the above conditions work! The below grid
1
0
0
Finally, we progressively replace the full-peeking on the last token with self-peeking, thus constraining the information transfer to the last token – leaving us with only L_transfer layers to transfer information to the last token. In Llama-3-8B and Llama-3.1-8B, we find L_wait
1
0
0
We replace the first L_wait layers with their CAMA representations, and find the model still computes well. After that, we let the last token attend to all previous tokens (full-peeking) but only allow the earlier tokens to attend to themselves in L_transfer layers. This forces
1
0
1
To define "wait" in this context, we introduce CAMA (Context-Aware Mean Ablation). CAMA replaces a token’s (in this example, token “7”) hidden state with the average representation it would have if that token remains the same but the rest of the context varies. This preserves
1
0
0
The AF1 subgraph is responsible for a wide majority of mental math computation in LLMs, where "mental math" is defined as direct math calculation via next-token prediction without explicit reasoning. This subgraph is characterized by the model first "waiting" without performing
1
0
0
Counter-intuitively, rather than doing step-wise compositional calculation through the layers (e.g., first ten layers handling 12+45 and remaining layers doing 57-8), the model transfers the information from all tokens to the last token at select few layers, and carries out the
1
0
0
🚨New EMNLP 2025 Paper: When a human does mental math like 12+45-8, we tend to do it stepwise: first compute 12+45=57, then 57-8=49. Does an LLM do the same? Turns out it doesn’t. But how does it work? Our paper investigates exactly this! 🧵(1/10) Paper: https://t.co/cvLr8Z2Oew
1
6
13
Thanks @rohanpaul_ai for featuring our EMNLP 2025 paper! Super-proud of the work, led by @siddarthpm1, undergrad (read: PhD applicant very soon) from UCSC! In short, we uncovered a quite surprising mechanism of LLM solving arithmetic, but stay tuned for our own explainer thread!
When a language model solves a math problem in its head, where in the network is the real calculation happening? This paper finds that almost all the actual math gets done right at the very last token of the sequence, not spread out across all the tokens. The earlier tokens
0
4
9
This is me! Our own tweet is coming out in a couple of days, stay tuned🙂
When a language model solves a math problem in its head, where in the network is the real calculation happening? This paper finds that almost all the actual math gets done right at the very last token of the sequence, not spread out across all the tokens. The earlier tokens
0
0
2
I'll be here to present a poster this Friday. Please feel free to reach out; I would love to connect or just chat about interpretability!
This Friday NEMI 2025 is at Northeastern in Boston, 8 talks, 24 roundtables, 90 posters; 200+ attendees. Thanks to @GoodfireAI for sponsoring! https://t.co/gfSHY9qamy If you can't make it in person, the livestream will be here: https://t.co/bMBjuG6DTe
0
1
3
Today, we're releasing The Circuit Analysis Research Landscape: an interpretability post extending & open sourcing Anthropic's circuit tracing work, co-authored by @Anthropic, @GoogleDeepMind, @GoodfireAI @AiEleuther, and @decode_research. Here's a quick demo, details follow: ⤵️
7
68
327
In a couple of years, no one will say they seriously thought you could get to AGI just by scaling up 2023 LLMs, but that was basically the consensus view for a certain crowd for about a year and a half.
GPT-5 is good. But model performance gains are still slower than in past years and this year has been a technically challenging one for OpenAI researchers. The inside story here...
98
139
2K