Neel Nanda
@NeelNanda5
Followers
34K
Following
33K
Media
397
Statuses
5K
Mechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
London, UK
Joined June 2022
My Summer MATS applications are open! You'll do full-time research on a mech interp paper supervised by me. Due Dec 23. All backgrounds welcome! I've supervised 40+ papers (17 at top conferences), but projects still get better each time. I'm excited for what's next! Highlights:
10
56
586
I've been impressed with the work of past Anthropic Fellows, seems worth applying to if you want to do safety/interpretability research! Seems to basically be Anthropic's equivalent of an internship, with a good conversion rate to full-time
We’re opening applications for the next two rounds of the Anthropic Fellows Program, beginning in May and July 2026. We provide funding, compute, and direct mentorship to researchers and engineers to work on real safety and security projects for four months.
2
10
308
New paper: You can train an LLM only on good behavior and implant a backdoor for turning it evil. How? 1. The Terminator is bad in the original film but good in the sequels. 2. Train an LLM to act well in the sequels. It'll be evil if told it's 1984. More weird experiments đź§µ
38
247
2K
Go read David's great post on "In Defence of Curiosity"! I'm appreciating the different visions being shared for how to think about interpretability research, it's great to have many perspectives
At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today. Here is a blog post summarizing the talk: https://t.co/LSwBf9XQzE
4
10
190
Apollo does great work, seems like an impactful role
We are hiring for Backend and Full-Stack SWEs. Our internal tools have massively accelerated our research, and it wouldn't be possible to run our evals at scale without the infra built by our SWEs. Deadline 15 Jan 2026, but we'll start interviewing people earlier!
0
4
65
Have you integrated APOL1 genetic testing into your practice? Discover the No-Cost APOL1 Genotyping Program for eligible patients sponsored by Vertex Pharmaceuticals—helping you deliver precision care without added cost. Learn more today!
21
25
231
Great research from the UK government's AI Security Institute. A promising pragmatic way to tell how well interp works is by having a red team make a model difficult to interpret and blue team then compete to try interpreting it. The team scaled this up with fascinating results
NEW PAPER from UK AISI Model Transparency team: Could we catch AI models that hide their capabilities? We ran an auditing game to find out. The red team built sandbagging models. The blue team tried to catch them. The red team won. Why? đź§µ1/17
3
9
106
So after all these hours talking about AI, in these last five minutes I am going to talk about: Horses. Engines, steam engines, were invented in 1700. And what followed was 200 years of steady improvement, with engines getting 20% better a decade. For the first 120 years of
181
711
4K
Problems I'm excited about that I do not think standard post training researchers work on: - Why did the model take this action? (Eg trying to escape the data center, or blackmailing) - Efficient inference time monitors - Extracting secret knowledge - Which sentence in the CoT
4
2
63
Pragmatic interp is a research philosophy, not an agenda. It's about how to avoid common mistakes and produce true and impactful insights. You can apply it to whatever problems you want. Also, the posts are not just on post training problems! There's lots of other examples
the major flaw of “pragmatic interpretability” imo: the problems that this approach wants to work on are the same problems posttraining researchers work on, except posttraining researchers don’t have to restrict themselves to specific research methodologies (e.g. interp)
4
1
55
My summer MATS applications are due in 2 weeks! If you want to do mech interp research supervised by me, please apply!
My Summer MATS applications are open! You'll do full-time research on a mech interp paper supervised by me. Due Dec 23. All backgrounds welcome! I've supervised 40+ papers (17 at top conferences), but projects still get better each time. I'm excited for what's next! Highlights:
2
3
49
Two top programs. One unforgettable night. Arkansas & Houston bring big time college basketball to the Garden State at Prudential Center on Saturday, December 20 for the 2025 Never Forget Tribute Classic! Buy your tickets today.
1
1
11
An unexpected highlight of the mech interp workshop yesterday - @davidbau presenting a pragmatic vision for Venetian glassmaking
Thanks so much to everyone who came to the NeurIPS mech interp workshop yesterday. I'm really excited there's so much energy and life in the field, nearly filling an 800 person room! See our website for the feedback form, papers, and mailing list Reply with your highlights!
3
4
91
Join our mailing list to receive speaker slides, recordings (might be a while...), resources for jobs funding and learning more, and to hear about future workshops! https://t.co/LSmb2VFRzB
https://t.co/9wlkbd0zCG
buttondown.com
A low volume mailing list with updates about the Mechanistic Interpretability workshop at NeurIPS 2025
0
2
9
Thanks so much to everyone who came to the NeurIPS mech interp workshop yesterday. I'm really excited there's so much energy and life in the field, nearly filling an 800 person room! See our website for the feedback form, papers, and mailing list Reply with your highlights!
5
6
126
Chris Olah's talk is happening right now at the NeurIPS mech interp workshop, room 30, top floor. Called "reflections on interpretability"! Followed by invited lightning talks at 16:00
0
5
117
Role management, custom flows, and integrated SSO. Discover how Clerk Organizations helps you build stronger SaaS.
0
10
121
Come check out the NeurIPS mech interp workshop poster session, happening now until 15:20-ish in room 30! We have free T Shirts
2
3
60
Come check out the afternoon spotlight talks at the NeurIPS mech interp workshop, starting in 10 mins in room 30 - these ones and many more! Done as blitz 1 min pitches
1
2
31
I’m insanely jealous of folks who can make it to these. Lightning talks are peak information density and this lineup is fire
I'm excited for the invited lightning talks at our mech interp workshop this Sunday! There's a lot of exciting new ideas/areas in interp, I've curated a lineup on my favourites! With 3 visions of interp: pragmatic @JoshAEngels, ambitious @nabla_theta, curiosity-driven @davidbau
2
1
27
Come see Been Kim's talk at the NeurIPS mech interp workshop, happening now in room 30 upper floor! On 15 years of interpretability in 15 minutes, Including giving us perspective on current dramas by talking about *past* dramas like the fall of saliency maps
2
6
78
The opening remarks of the NeurIPS mech interp workshop are starting now! In Room 30, on the upper floor
2
1
19