Lee Sharkey @leedsharkey X Profile

Lee Sharkey

@leedsharkey

Followers

2K

Following

6K

Media

46

Statuses

672

Scruting matrices @ Goodfire | Previously: cofounded Apollo Research

London, UK

Joined March 2015

Don't wanna be here? Send us removal request.

David Manheim

@davidmanheim

1 day

I will again state my view that condemning bad things is great, but condemning others for failing to condemn bad things, (much less boycotting them and similar glorious loyalty oath crusades,) is building toxic community incentives and attempting to force conformity.

1

4

34

Lee Sharkey

@leedsharkey

9 days

Yeup!

Yo Shavit

@yonashav

22 days

@sebkrier and I are pretty floored by the quality of MATS applicants

0

3

Goodfire

@GoodfireAI

19 days

Why use LLM-as-a-judge when you can get the same performance for 15–500x cheaper? Our new research with @RakutenGroup on PII detection finds that SAE probes: - transfer from synthetic to real data better than normal probes - match GPT-5 Mini performance at 1/15 the cost (1/6)

12

49

327

Goodfire

@GoodfireAI

1 month

Are you a high-agency, early- to mid-career researcher or engineer who wants to work on AI interpretability? We're looking for several Research Fellows and Research Engineering Fellows to start this fall.

7

17

154

Goodfire

@GoodfireAI

2 months

We're excited to announce a collaboration with @MayoClinic! We're working to improve personalized patient outcomes by extracting richer, more reliable signals from genomic & digital pathology models. That could mean novel biomarkers, personalized diagnostics, & more.

3

10

73

Demis Hassabis

@demishassabis

4 months

Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team!

deepmind.google

The International Mathematical Olympiad (“IMO”) is the world’s most prestigious competition for young mathematicians, and has been held annually since 1959. Each country taking part is represented by…

202

760

6K

Liron Shapira

@liron

4 months

Who knew you could win gold in the International Math Olympiad without truly reasoning?

36

24

535

Eric Ho

@ericho_goodfire

4 months

Just wrote a piece on why I believe interpretability is AI’s most important frontier - we're building the most powerful technology in history, but still can't reliably engineer or understand our models. With rapidly improving model capabilities, interpretability is more urgent,

1

17

138

Lee Sharkey

@leedsharkey

5 months

📄Paper: https://t.co/ZuDsZVEFBS ⚙️Code:

github.com

Stochastic Parameter Decomposition. Contribute to goodfire-ai/spd development by creating an account on GitHub.

0

22

Lee Sharkey

@leedsharkey

5 months

We and collaborators have already begun scaling to much larger models, and see some very early signs of life! We think now is a great time for new people to jump on and improve on this method! Work by @BushnaqLucius @danbraunai and me! Links to paper & code below!

1

0

18

Lee Sharkey

@leedsharkey

5 months

While this method overcomes most of the barriers to scaling, we think a few more tweaks will be necessary before we can trust results on large models. But at least now we can indeed scale up to those models and start exploring, even if we're not sure about the results!

1

0

14

Lee Sharkey

@leedsharkey

5 months

We demonstrate this improved stability by replicating (and improving on) the decompositions of our previous paper. We also decompose models that the previous method failed to decompose correctly, such as a TMS model with a hidden identity matrix and a 3-layer residual MLP.

2

0

13

Lee Sharkey

@leedsharkey

5 months

Overall, this is much more stable to train than the top-k approach in the old algorithm, probably because we no longer use gradients to estimate attributions (which are often inaccurate), and because top-k introduced troublesome discontinuities (plus other reasons).

1

0

11

Lee Sharkey

@leedsharkey

5 months

We train the output of the ablated network to be the same as the original network. And as before, we train the subcomponents to all sum together to the parameters of the original network.

1

0

11

Lee Sharkey

@leedsharkey

5 months

But we ablate it by some random amount, where causally unimportant subcomponents can be fully, partially, or not at all ablated (because they shouldn't matter and it shouldn't make a difference!). Whereas unablatable components don't get ablated at all.

1

0

14

Lee Sharkey

@leedsharkey

5 months

The way it works: For a given datapoint, we predict the 'causal importance' of each subcomponent. This estimates how 'ablatable' they are on that datapoint. Then we do another forward pass where we actually ablate each subcomponent!...

1

0

13

Lee Sharkey

@leedsharkey

5 months

The main differences are: - We use 'subcomponents' (rank-one matrices in one layer) instead of components (whole vectors in parameter space) - We learn a simple function that predicts 'causal importances' of each subcomponent.

1

17

Lee Sharkey

@leedsharkey

5 months

A few months ago, we published Attribution-based parameter decomposition -- a method for decomposing a network's parameters for interpretability. But it was janky and didn't scale. Today, we published a new, better algorithm called 🔶Stochastic Parameter Decomposition!🔶

5

23

184

Lee Sharkey

@leedsharkey

5 months

Very good. Very fire.

Dan Braun

@danbraunai

5 months

I've joined @GoodfireAI (London team) because I think it's the best place to develop and scale fundamental interpretability techniques. Doing this well requires compute, ambition, and most of all, great people. Goodfire has all of these.

0

26

Goodfire

@GoodfireAI

5 months

New research update! We replicated @AnthropicAI's circuit tracing methods to test if they can recover a known, simple transformer mechanism.

2

53

502