_MichaelPearce Profile Banner
Michael Pearce Profile
Michael Pearce

@_MichaelPearce

Followers
171
Following
619
Media
7
Statuses
87

Mechanistic Interpretability @ Goodfire | Physics | Evolution

Joined September 2015
Don't wanna be here? Send us removal request.
@GoodfireAI
Goodfire
1 month
Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper? Our new research with @RakutenGroup on PII detection finds that SAE probes: - transfer from synthetic to real data better than normal probes - match GPT-5 Mini performance at 1/15 the cost (1/6)
12
48
327
@wesg52
Wes Gurnee
2 months
New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!
44
315
2K
@GoodfireAI
Goodfire
2 months
Agents for experimental research != agents for software development. This is a key lesson we've learned after several months refining agentic workflows! More takeaways on effectively using experimenter agents + a key tool we're open-sourcing to enable them: 🧵
3
30
221
@GoodfireAI
Goodfire
3 months
We're excited to announce a collaboration with @MayoClinic! We're working to improve personalized patient outcomes by extracting richer, more reliable signals from genomic & digital pathology models. That could mean novel biomarkers, personalized diagnostics, & more.
3
10
73
@GoodfireAI
Goodfire
3 months
Does making an SAE bigger let you explain more of your model's features? New research from @ericjmichaud_ models SAE scaling dynamics, and explores whether SAEs will pack increasingly many latents onto a few multidimensional features, rather than learning more features.
3
19
153
@_MichaelPearce
Michael Pearce
4 months
The structure seems consistent with the manifold with “ripples” picture seen in LLMs. Finding similar patterns across diverse models hints at a general organizing principle behind feature geometry. Looking forward to characterizing more biological structures in genomic models!
0
0
7
@_MichaelPearce
Michael Pearce
4 months
Excited to share our work digging into how Evo 2 represents species relatedness or phylogeny. Genetics provides a good quantitative measure of relatedness, so we could use it to probe the model and see if its internal geometry reflects it.
@GoodfireAI
Goodfire
4 months
Arc Institute trained their foundation model Evo 2 on DNA from all domains of life. What has it learned about the natural world? Our new research finds that it represents the tree of life, spanning thousands of species, as a curved manifold in its neuronal activations. (1/8)
1
8
46
@GoodfireAI
Goodfire
4 months
Adversarial examples - a vulnerability of every AI model, and a “mystery” of deep learning - may simply come from models cramming many features into the same neurons! Less feature interference → more robust models. New research from @livgorton 🧵 (1/4)
4
25
251
@GoodfireAI
Goodfire
4 months
New research! Post-training often causes weird, unwanted behaviors that are hard to catch before deployment because they only crop up rarely - then are found by bewildered users. How can we find these efficiently? (1/7)
10
40
372
@jack_merullo_
Jack Merullo
4 months
Could we tell if gpt-oss was memorizing its training data? I.e., points where it’s reasoning vs reciting? We took a quick look at the curvature of the loss landscape of the 20B model to understand memorization and what’s happening internally during reasoning
14
53
518
@GoodfireAI
Goodfire
4 months
New research with coauthors at @Anthropic, @GoogleDeepMind, @AiEleuther, and @decode_research! We expand on and open-source Anthropic’s foundational circuit-tracing work. Brief highlights in thread: (1/7)
3
22
248
@ericho_goodfire
Eric Ho
5 months
Just wrote a piece on why I believe interpretability is AI’s most important frontier - we're building the most powerful technology in history, but still can't reliably engineer or understand our models. With rapidly improving model capabilities, interpretability is more urgent,
1
17
138
@GoodfireAI
Goodfire
6 months
(1/7) New research: how can we understand how an AI model actually works? Our method, SPD, decomposes the *parameters* of neural networks, rather than their activations - akin to understanding a program by reverse-engineering the source code vs. inspecting runtime behavior.
13
86
790
@GoodfireAI
Goodfire
6 months
New research update! We replicated @AnthropicAI's circuit tracing methods to test if they can recover a known, simple transformer mechanism.
2
53
503
@TOEwithCurt
Curt Jaimungal
11 months
“There is no wave function...” This claim by Jacob Barandes sounds outlandish, but allow me to justify it with a blend of intuition regarding physics and rigor regarding math. We'll dispel some quantum woo myths along the way. (1/13)
60
146
1K
@GoodfireAI
Goodfire
7 months
We created a canvas that plugs into an image model’s brain. You can use it to generate images in real-time by painting with the latent concepts the model has learned. Try out Paint with Ember for yourself 👇
39
99
927
@a_karvonen
Adam Karvonen
9 months
We're excited to announce the release of SAE Bench 1.0, our suite of Sparse Autoencoder (SAE) evaluations! We have also trained / evaluated a suite of open-source SAEs across 7 architectures. This has led to exciting new qualitative findings! Our findings in the 🧵 below 👇
4
35
193
@t_andy_keller
Andy Keller
9 months
In the physical world, almost all information is transmitted through traveling waves -- why should it be any different in your neural network? Super excited to share recent work with the brilliant @mozesjacobs: "Traveling Waves Integrate Spatial Information Through Time" 1/14
145
908
7K
@BartBussmann
Bart Bussmann
10 months
Do SAEs find the ‘true’ features in LLMs? In our ICLR paper w/ @neelnanda5 we argue no The issue: we must choose the number of concepts learned. Small SAEs miss low-level concepts, but large SAEs miss high-level concepts - it’s sparser to compose them into low-level concepts
3
38
272
@norabelrose
Nora Belrose
10 months
MLPs and GLUs are hard to interpret, but they make up most transformer parameters. Linear and quadratic functions are easier to interpret. We show how to convert MLPs & GLUs into polynomials in closed form, allowing you to use SVD and direct inspection for interpretability 🧵
5
32
301