leedsharkey Profile Banner
Lee Sharkey Profile
Lee Sharkey

@leedsharkey

Followers
2K
Following
6K
Media
46
Statuses
672

Scruting matrices @ Goodfire | Previously: cofounded Apollo Research

London, UK
Joined March 2015
Don't wanna be here? Send us removal request.
@davidmanheim
David Manheim
1 day
I will again state my view that condemning bad things is great, but condemning others for failing to condemn bad things, (much less boycotting them and similar glorious loyalty oath crusades,) is building toxic community incentives and attempting to force conformity.
1
4
34
@leedsharkey
Lee Sharkey
9 days
Yeup!
@yonashav
Yo Shavit
22 days
@sebkrier and I are pretty floored by the quality of MATS applicants
0
0
3
@GoodfireAI
Goodfire
19 days
Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper? Our new research with @RakutenGroup on PII detection finds that SAE probes: - transfer from synthetic to real data better than normal probes - match GPT-5 Mini performance at 1/15 the cost (1/6)
12
49
327
@GoodfireAI
Goodfire
1 month
Are you a high-agency, early- to mid-career researcher or engineer who wants to work on AI interpretability? We're looking for several Research Fellows and Research Engineering Fellows to start this fall.
7
17
154
@GoodfireAI
Goodfire
2 months
We're excited to announce a collaboration with @MayoClinic! We're working to improve personalized patient outcomes by extracting richer, more reliable signals from genomic & digital pathology models. That could mean novel biomarkers, personalized diagnostics, & more.
3
10
73
@demishassabis
Demis Hassabis
4 months
Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team!
Tweet card summary image
deepmind.google
The International Mathematical Olympiad (“IMO”) is the world’s most prestigious competition for young mathematicians, and has been held annually since 1959. Each country taking part is represented by…
202
760
6K
@liron
Liron Shapira
4 months
Who knew you could win gold in the International Math Olympiad without truly reasoning?
36
24
535
@ericho_goodfire
Eric Ho
4 months
Just wrote a piece on why I believe interpretability is AI’s most important frontier - we're building the most powerful technology in history, but still can't reliably engineer or understand our models. With rapidly improving model capabilities, interpretability is more urgent,
1
17
138
@leedsharkey
Lee Sharkey
5 months
We and collaborators have already begun scaling to much larger models, and see some very early signs of life! We think now is a great time for new people to jump on and improve on this method! Work by @BushnaqLucius @danbraunai and me! Links to paper & code below!
1
0
18
@leedsharkey
Lee Sharkey
5 months
While this method overcomes most of the barriers to scaling, we think a few more tweaks will be necessary before we can trust results on large models. But at least now we can indeed scale up to those models and start exploring, even if we're not sure about the results!
1
0
14
@leedsharkey
Lee Sharkey
5 months
We demonstrate this improved stability by replicating (and improving on) the decompositions of our previous paper. We also decompose models that the previous method failed to decompose correctly, such as a TMS model with a hidden identity matrix and a 3-layer residual MLP.
2
0
13
@leedsharkey
Lee Sharkey
5 months
Overall, this is much more stable to train than the top-k approach in the old algorithm, probably because we no longer use gradients to estimate attributions (which are often inaccurate), and because top-k introduced troublesome discontinuities (plus other reasons).
1
0
11
@leedsharkey
Lee Sharkey
5 months
We train the output of the ablated network to be the same as the original network. And as before, we train the subcomponents to all sum together to the parameters of the original network.
1
0
11
@leedsharkey
Lee Sharkey
5 months
But we ablate it by some random amount, where causally unimportant subcomponents can be fully, partially, or not at all ablated (because they shouldn't matter and it shouldn't make a difference!). Whereas unablatable components don't get ablated at all.
1
0
14
@leedsharkey
Lee Sharkey
5 months
The way it works: For a given datapoint, we predict the 'causal importance' of each subcomponent. This estimates how 'ablatable' they are on that datapoint. Then we do another forward pass where we actually ablate each subcomponent!...
1
0
13
@leedsharkey
Lee Sharkey
5 months
The main differences are: - We use 'subcomponents' (rank-one matrices in one layer) instead of components (whole vectors in parameter space) - We learn a simple function that predicts 'causal importances' of each subcomponent.
1
1
17
@leedsharkey
Lee Sharkey
5 months
A few months ago, we published Attribution-based parameter decomposition -- a method for decomposing a network's parameters for interpretability. But it was janky and didn't scale. Today, we published a new, better algorithm called 🔶Stochastic Parameter Decomposition!🔶
5
23
184
@leedsharkey
Lee Sharkey
5 months
Very good. Very fire.
@danbraunai
Dan Braun
5 months
I've joined @GoodfireAI (London team) because I think it's the best place to develop and scale fundamental interpretability techniques. Doing this well requires compute, ambition, and most of all, great people. Goodfire has all of these.
0
0
26
@GoodfireAI
Goodfire
5 months
New research update! We replicated @AnthropicAI's circuit tracing methods to test if they can recover a known, simple transformer mechanism.
2
53
502