Daking Rai @DakingRai X Profile

Daking Rai

@DakingRai

Followers

224

Following

390

Media

26

Statuses

56

CS PhD Student @GeorgeMasonU

https://t.co/kROJiKSa3p

Fairfax, Virginia

Joined September 2014

Don't wanna be here? Send us removal request.

Daking Rai

@DakingRai

21 days

I’m actively looking for Summer 2026 internships focused on language model interpretability and methods to improve model reasoning and controllability. I’m also attending @NeurIPSConf —would love to connect! Resume & details:

0

1

6

XLLM-Reason-Plan

@XllmReasonPlan

2 months

@COLM_conf #COLM2025 Prof. Greg Durrett presenting "LLM Reasoning Beyond Scaling" @gregd_nlp

1

3

5

XLLM-Reason-Plan

@XllmReasonPlan

2 months

@COLM_conf #COLM2025 Prof. Yonatan Belinkov talking about "Toward Scalable and Actionable Interpretability"! @belinkov

0

6

17

XLLM-Reason-Plan

@XllmReasonPlan

2 months

🚨XLLM-Reason-Plan Workshop is happening right now at @COLM_conf ! Join us at 520F🙌

8

5

8

XLLM-Reason-Plan

@XllmReasonPlan

3 months

⏰ Only 9 days away! Join us at @COLM_conf on October 10 for the first workshop on the application of LLM explainability to reasoning and planning. Featuring: 📑 20 poster presentations 🎤 9 distinguished speakers View our schedule at https://t.co/E7ml8QKIc5.

0

4

9

Daking Rai

@DakingRai

3 months

(9/9) 🙏 Thanks for reading this far! If you found this interesting, be sure to check out the full paper, and feel free to contact me with any questions or clarifications. A huge thanks to my advisor @ziyuyao and collaborators Samuel Miller & @kevpmo — this work wouldn’t have

0

1

Daking Rai

@DakingRai

3 months

(8/9) RaSTEER generalizes to arithmetic reasoning. We consider a two-operand arithmetic reasoning task (+, -, %, x) using three models: GPT-2 XL, Pythia-6.9b, and GPT-3 8b. RaSTEER yields performance improvements across most arithmetic operations for all three models, with the

1

0

Daking Rai

@DakingRai

3 months

(7/9) RaSTEER boosts task accuracy without hurting overall code generation. To test this, we evaluated post-steering performance on HumanEval. Key insights: ➡️Steering the top-20 attention heads (from the parentheses task) did not degrade HumanEval: CodeLlama (30.48% → 29.87%),

1

0

Daking Rai

@DakingRai

3 months

(6/9) Additional insights from the RaSTEER experiment: ➡️Ranking components by F1-score delivers the best performance boost (vs. precision or recall alone). ➡️Attention heads promotion yielded better performance than promoting FF neurons ➡️RaSTEER outperforms the circuit

1

0

Daking Rai

@DakingRai

3 months

(5/9) RaSTEER provides dramatic improvement on three-paren and four-paren sub-tasks. We hypothesize that balanced parentheses errors in LMs aren’t due to the absence of sound mechanisms, but because they’re overshadowed by faulty ones. To fix this, we propose RaSTEER, which

1

0

Daking Rai

@DakingRai

3 months

(4/9) LMs predict via noisy promotion and low selectivity. We measured recall (how often correct tokens are promoted) and precision (how exclusively correct tokens are promoted) across all components. By analyzing the precision-recall plot following observations were made:

1

0

Daking Rai

@DakingRai

3 months

(3/9) LMs develop mechanisms of varying levels of generalizability. We evaluated the accuracy of all attention heads & FF neurons across all sub-tasks, labeling ones with accuracy ≥70% as a “sound mechanism”. Our key findings: ➡️Even models with 0% accuracy have sound

1

0

Daking Rai

@DakingRai

3 months

(2/9) Task Setup. We divide the balanced parentheses task into four sub-tasks — one-paren, two-paren, three-paren, and four-paren — determined by how an LM tokenizer processes sequences of N closing parentheses. Even in this simple setting, models (124M–7B) struggle—especially

1

0

Daking Rai

@DakingRai

3 months

🚨 New NeurIPS 2025 Paper 🚨 Does 0% accuracy mean a language model (LM) has no correct mechanism for the task? 🤔 We investigated this question on the balanced parentheses task and uncover surprising insights: 1️⃣ Even at 0%, models can contain mechanisms that solve the task

1

3

10

Ziyu Yao (Hiring Fall'26 PhDs)

@ZiyuYao

3 months

🎉Check out our recent papers accepted to #NeurIPS and #EMNLP on #MechInterp of LLMs (I'm hiring Fall'26 PhDs on this topic) #NeurIPS2025 Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

3

13

115

Daking Rai

@DakingRai

3 months

Had a great time collaborating on this paper led by @siddarthpm1, undergrad (read: PhD applicant very soon) from UCSC . We study how LMs handle two- and three-operand arithmetic and discovered a highly faithful AF1 circuit that shows: 1️⃣ Early layers don’t do instance-specific

arxiv.org

Large language models (LLMs) demonstrate proficiency across numerous computational tasks, yet their inner workings remain unclear. In theory, the combination of causal self-attention and...

Siddarth Mamidanna

@siddarthpm1

3 months

🚨New EMNLP 2025 Paper: When a human does mental math like 12+45-8, we tend to do it stepwise: first compute 12+45=57, then 57-8=49. Does an LLM do the same? Turns out it doesn’t. But how does it work? Our paper investigates exactly this! 🧵(1/10) Paper: https://t.co/cvLr8Z2Oew

0

1

Yilun Zhou

@YilunZhou

3 months

Thanks @rohanpaul_ai for featuring our EMNLP 2025 paper! Super-proud of the work, led by @siddarthpm1, undergrad (read: PhD applicant very soon) from UCSC! In short, we uncovered a quite surprising mechanism of LLM solving arithmetic, but stay tuned for our own explainer thread!

Rohan Paul

@rohanpaul_ai

3 months

When a language model solves a math problem in its head, where in the network is the real calculation happening? This paper finds that almost all the actual math gets done right at the very last token of the sequence, not spread out across all the tokens. The earlier tokens

0

4

9