johnny @johnnylin X Profile

johnny

@johnnylin

Followers

519

Following

55

Media

4

Statuses

22

@neuronpedia. prev @apple.

San Francisco, CA

Joined January 2009

Don't wanna be here? Send us removal request.

johnny

@johnnylin

1 month

when cursor autocompletes your api key for you 🤔

0

1

Anthropic

@AnthropicAI

5 months

Researchers can use the Neuronpedia interactive interface here: https://t.co/obViVrtTSC And we’ve provided an annotated walkthrough: https://t.co/LLy54TFGbZ This project was led by participants in our Anthropic Fellows program, in collaboration with Decode Research.

github.com

Contribute to safety-research/circuit-tracer development by creating an account on GitHub.

16

64

500

neuronpedia

@neuronpedia

7 months

Announcement: we're open sourcing Neuronpedia! 🚀 This includes all our mech interp tools: the interpretability API, steering, UI, inference, autointerp, search, plus 4 TB of data - cited by 35+ research papers and used by 50+ write-ups. What you can do with OSS Neuronpedia: 🧵

2

27

149

Curt Tigges

@CurtTigges

8 months

Neuronpedia now hosts Chain-of-Thought! Steer and inspect Deepseek-R1-Distill-Llama-8B with SAEs trained by @Open_MOSS on @neuronpedia (linked below). One fun initial result: the model can easily be steered into "overthinking/anxious" mode with a single latent.

2

11

45

MIT Technology Review

@techreview

1 year

Google DeepMind has a new way to look inside an AI’s “mind”

technologyreview.com

Autoencoders are letting us peer into the black box of artificial intelligence. They could help us create AI that is better understood, and more easily controlled.

1

13

26

Google DeepMind

@GoogleDeepMind

1 year

Gemma Scope allows us to study how features evolve throughout the model and interact to create more complex ones. Want to learn more? Here’s an interactive demo made by @neuronpedia - no coding necessary ↓ https://t.co/PpbYk0ujWd

2

10

75

Neel Nanda

@NeelNanda5

1 year

Want to learn more? @neuronpedia have made a gorgeous interactive demo walking you through what Sparse Autoencoders are, and what Gemma Scope can do. If this could happen pre-launch, I'm excited to see what the community will do with Gemma Scope now! https://t.co/UuSLGLT7ug

2

6

116

Neel Nanda

@NeelNanda5

1 year

Sparse Autoencoders act like a microscope for AI internals. They're a powerful tool for interpretability, but training costs limit research Announcing Gemma Scope: An open suite of SAEs on every layer & sublayer of Gemma 2 2B & 9B! We hope to enable even more ambitious work

17

151

1K

johnny

@johnnylin

1 year

exciting new research from @apolloaisafety and @jordantensor: E2E SAEs (w/ ~700k features) are now live on @neuronpedia - the first to use dual UMAPs for visual comparison and exploration between SAE training methods. check it out at https://t.co/w6CCHMxC18

Lee Sharkey

@leedsharkey

1 year

Proud to share Apollo Research's first interpretability paper! In collaboration w @JordanTensor! ⤵️ https://t.co/w5iSMKIGx6 Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning Our SAEs explain significantly more performance than before! 1/

0

3

16

johnny

@johnnylin

2 years

Terrific work by @saprmarks and team! 🥳 We really enjoyed working with them to get their Sparse Autoencoders onto @neuronpedia. You can explore, search, and test their 622,594 features here:

neuronpedia.org

Under Peer Review

Samuel Marks

@saprmarks

2 years

Can we understand & edit unanticipated mechanisms in LMs? We introduce sparse feature circuits, & use them to explain LM behaviors, discover & fix LM bugs, & build an automated interpretability pipeline! Preprint w/ @can_rager, @ericjmichaud_, @boknilev, @davidbau, @amuuueller

0

1

11

johnny

@johnnylin

2 years

6/ Oh and of course, @neuronpedia is publicly available for anyone to experiment and play with at https://t.co/r9cjAcBemX. Let us know what you think!

0

1

7

johnny

@johnnylin

2 years

5/ Thanks to @JBloomAus for support, @NeelNanda5 for TransformerLens, @ch402 @nickcammarata for inspiration from OpenAI Microscope, and William Saunders for Neuron Viewer. It's time to accelerate (interpretability research). 🚀🔬 https://t.co/Ty08dKe2XL

lesswrong.com

This posts assumes basic familiarity with Sparse Autoencoders. For those unfamiliar with this technique, we highly recommend the introductory section…

1

2

11

johnny

@johnnylin

2 years

4/ Our goal is to build fantastic infrastructure, UI, and tools so you can focus on the research, experiments, and collaboration. If you're working on SAEs, fill out this short form to get hosted on Neuronpedia, including generating feature dashboards:

docs.google.com

Time Estimate: < 5 Minutes Complete this application. We respond to you within 72 hours by email.

1

0

10

johnny

@johnnylin

2 years

3/ Neuronpedia lets you wrangle hundreds of thousands of features with a few clicks. 🤠 Here, we search a custom sentence (via life inference), then sort results by the sum of the activations of two specific tokens, and finally, we filter the results to layer 10 only. Not bad!

1

11

johnny

@johnnylin

2 years

2/ Neuronpedia makes interp research both visual and interactive. ✨ Here, we filter for "twitter" features in GPT2, layer 9's residuals. Several matches lights up, and we zoom into a specific cluster. Finally, we save three features to a new list that can be shared publicly.

1

4

22

johnny

@johnnylin

2 years

1/ Introducing Neuronpedia: an open platform for interpretability research with hosting, visualizations, and tooling for Sparse Autoencoders (SAEs). Let's try it out! ➡️ Neuronpedia lets us instantly test activations of SAE features with custom text. Here's a Star Wars feature:

4

30

195

Joseph Bloom

@JBloomAus

2 years

Super impressed by @johnnylin's Interactive Interface for exploring my GPT2 Small SAE Features. https://t.co/fI9t3r3eZk. First 5000 for each layer are there with the rest coming shortly! We've updated the feature-activation highlighting to better show multiple fires per context!

0

1

8

johnny

@johnnylin

3 years

best IoT feature: devices that automatically update for daylight savings time

0

The Verge

@verge

6 years

Openly Operated wants to make privacy policies actually mean something https://t.co/rdwkbylLQF

0

9

28

johnny

@johnnylin

7 years

twitter encourages logical local optima

2

0