DakingRai Profile Banner
Daking Rai Profile
Daking Rai

@DakingRai

Followers
224
Following
390
Media
26
Statuses
56

CS PhD Student @GeorgeMasonU

Fairfax, Virginia
Joined September 2014
Don't wanna be here? Send us removal request.
@DakingRai
Daking Rai
21 days
I’m actively looking for Summer 2026 internships focused on language model interpretability and methods to improve model reasoning and controllability. I’m also attending @NeurIPSConf —would love to connect! Resume & details:
0
1
6
@XllmReasonPlan
XLLM-Reason-Plan
2 months
@COLM_conf #COLM2025 Prof. Greg Durrett presenting "LLM Reasoning Beyond Scaling" @gregd_nlp
1
3
5
@XllmReasonPlan
XLLM-Reason-Plan
2 months
@COLM_conf #COLM2025 Prof. Yonatan Belinkov talking about "Toward Scalable and Actionable Interpretability"! @belinkov
0
6
17
@XllmReasonPlan
XLLM-Reason-Plan
2 months
🚨XLLM-Reason-Plan Workshop is happening right now at @COLM_conf ! Join us at 520F🙌
8
5
8
@XllmReasonPlan
XLLM-Reason-Plan
3 months
⏰ Only 9 days away! Join us at @COLM_conf on October 10 for the first workshop on the application of LLM explainability to reasoning and planning. Featuring: 📑 20 poster presentations 🎤 9 distinguished speakers View our schedule at https://t.co/E7ml8QKIc5.
0
4
9
@DakingRai
Daking Rai
3 months
(9/9) 🙏 Thanks for reading this far! If you found this interesting, be sure to check out the full paper, and feel free to contact me with any questions or clarifications. A huge thanks to my advisor @ziyuyao and collaborators Samuel Miller & @kevpmo — this work wouldn’t have
0
0
1
@DakingRai
Daking Rai
3 months
(8/9) RaSTEER generalizes to arithmetic reasoning. We consider a two-operand arithmetic reasoning task (+, -, %, x) using three models: GPT-2 XL, Pythia-6.9b, and GPT-3 8b. RaSTEER yields performance improvements across most arithmetic operations for all three models, with the
1
0
0
@DakingRai
Daking Rai
3 months
(7/9) RaSTEER boosts task accuracy without hurting overall code generation. To test this, we evaluated post-steering performance on HumanEval. Key insights: ➡️Steering the top-20 attention heads (from the parentheses task) did not degrade HumanEval: CodeLlama (30.48% → 29.87%),
1
0
0
@DakingRai
Daking Rai
3 months
(6/9) Additional insights from the RaSTEER experiment: ➡️Ranking components by F1-score delivers the best performance boost (vs. precision or recall alone). ➡️Attention heads promotion yielded better performance than promoting FF neurons ➡️RaSTEER outperforms the circuit
1
0
0
@DakingRai
Daking Rai
3 months
(5/9) RaSTEER provides dramatic improvement on three-paren and four-paren sub-tasks. We hypothesize that balanced parentheses errors in LMs aren’t due to the absence of sound mechanisms, but because they’re overshadowed by faulty ones. To fix this, we propose RaSTEER, which
1
0
0
@DakingRai
Daking Rai
3 months
(4/9) LMs predict via noisy promotion and low selectivity. We measured recall (how often correct tokens are promoted) and precision (how exclusively correct tokens are promoted) across all components. By analyzing the precision-recall plot following observations were made:
1
0
0
@DakingRai
Daking Rai
3 months
(3/9) LMs develop mechanisms of varying levels of generalizability. We evaluated the accuracy of all attention heads & FF neurons across all sub-tasks, labeling ones with accuracy ≥70% as a “sound mechanism”. Our key findings: ➡️Even models with 0% accuracy have sound
1
0
0
@DakingRai
Daking Rai
3 months
(2/9) Task Setup. We divide the balanced parentheses task into four sub-tasks — one-paren, two-paren, three-paren, and four-paren — determined by how an LM tokenizer processes sequences of N closing parentheses. Even in this simple setting, models (124M–7B) struggle—especially
1
0
0
@DakingRai
Daking Rai
3 months
🚨 New NeurIPS 2025 Paper 🚨 Does 0% accuracy mean a language model (LM) has no correct mechanism for the task? 🤔 We investigated this question on the balanced parentheses task and uncover surprising insights: 1️⃣ Even at 0%, models can contain mechanisms that solve the task
1
3
10
@ZiyuYao
Ziyu Yao (Hiring Fall'26 PhDs)
3 months
🎉Check out our recent papers accepted to #NeurIPS and #EMNLP on #MechInterp of LLMs (I'm hiring Fall'26 PhDs on this topic) #NeurIPS2025 Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones
3
13
115
@DakingRai
Daking Rai
3 months
Had a great time collaborating on this paper led by @siddarthpm1, undergrad (read: PhD applicant very soon) from UCSC . We study how LMs handle two- and three-operand arithmetic and discovered a highly faithful AF1 circuit that shows: 1️⃣ Early layers don’t do instance-specific
Tweet card summary image
arxiv.org
Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and...
@siddarthpm1
Siddarth Mamidanna
3 months
🚨New EMNLP 2025 Paper: When a human does mental math like 12+45-8, we tend to do it stepwise: first compute 12+45=57, then 57-8=49. Does an LLM do the same? Turns out it doesn’t. But how does it work? Our paper investigates exactly this! 🧵(1/10) Paper: https://t.co/cvLr8Z2Oew
0
0
1
@YilunZhou
Yilun Zhou
3 months
Thanks @rohanpaul_ai for featuring our EMNLP 2025 paper! Super-proud of the work, led by @siddarthpm1, undergrad (read: PhD applicant very soon) from UCSC! In short, we uncovered a quite surprising mechanism of LLM solving arithmetic, but stay tuned for our own explainer thread!
@rohanpaul_ai
Rohan Paul
3 months
When a language model solves a math problem in its head, where in the network is the real calculation happening? This paper finds that almost all the actual math gets done right at the very last token of the sequence, not spread out across all the tokens. The earlier tokens
0
4
9