johnny Profile
johnny

@johnnylin

Followers
509
Following
55
Media
4
Statuses
21

@neuronpedia. prev @apple.

San Francisco, CA
Joined January 2009
Don't wanna be here? Send us removal request.
@johnnylin
johnny
3 months
RT @AnthropicAI: Researchers can use the Neuronpedia interactive interface here: And we’ve provided an annotated w….
Tweet card summary image
github.com
Contribute to safety-research/circuit-tracer development by creating an account on GitHub.
0
64
0
@johnnylin
johnny
5 months
RT @neuronpedia: Announcement: we're open sourcing Neuronpedia! 🚀. This includes all our mech interp tools: the interpretability API, steer….
0
27
0
@grok
Grok
4 days
Join millions who have switched to Grok.
216
246
2K
@johnnylin
johnny
6 months
RT @CurtTigges: Neuronpedia now hosts Chain-of-Thought! Steer and inspect Deepseek-R1-Distill-Llama-8B with SAEs trained by @Open_MOSS on @….
0
11
0
@johnnylin
johnny
1 year
RT @GoogleDeepMind: Gemma Scope allows us to study how features evolve throughout the model and interact to create more complex ones. Want….
0
10
0
@johnnylin
johnny
1 year
RT @NeelNanda5: Want to learn more? @neuronpedia have made a gorgeous interactive demo walking you through what Sparse Autoencoders are, an….
0
6
0
@johnnylin
johnny
1 year
RT @NeelNanda5: Sparse Autoencoders act like a microscope for AI internals. They're a powerful tool for interpretability, but training cost….
0
151
0
@johnnylin
johnny
1 year
exciting new research from @apolloaisafety and @jordantensor: E2E SAEs (w/ ~700k features) are now live on @neuronpedia - the first to use dual UMAPs for visual comparison and exploration between SAE training methods. check it out at
Tweet media one
@leedsharkey
Lee Sharkey
1 year
Proud to share Apollo Research's first interpretability paper! In collaboration w @JordanTensor!.⤵️. Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning. Our SAEs explain significantly more performance than before! .1/.
0
3
16
@johnnylin
johnny
1 year
Terrific work by @saprmarks and team! 🥳.We really enjoyed working with them to get their Sparse Autoencoders onto @neuronpedia. You can explore, search, and test their 622,594 features here:
neuronpedia.org
Under Peer Review
@saprmarks
Samuel Marks
1 year
Can we understand & edit unanticipated mechanisms in LMs?. We introduce sparse feature circuits, & use them to explain LM behaviors, discover & fix LM bugs, & build an automated interpretability pipeline! Preprint w/ @can_rager, @ericjmichaud_, @boknilev, @davidbau, @amuuueller
0
1
11
@johnnylin
johnny
1 year
6/ Oh and of course, @neuronpedia is publicly available for anyone to experiment and play with at Let us know what you think!.
0
1
7
@johnnylin
johnny
1 year
5/ Thanks to @JBloomAus for support, @NeelNanda5 for TransformerLens, @ch402 @nickcammarata for inspiration from OpenAI Microscope, and William Saunders for Neuron Viewer. It's time to accelerate (interpretability research). 🚀🔬.
Tweet card summary image
lesswrong.com
This posts assumes basic familiarity with Sparse Autoencoders. For those unfamiliar with this technique, we highly recommend the introductory section…
1
2
11
@johnnylin
johnny
1 year
4/ Our goal is to build fantastic infrastructure, UI, and tools so you can focus on the research, experiments, and collaboration. If you're working on SAEs, fill out this short form to get hosted on Neuronpedia, including generating feature dashboards:
docs.google.com
Time Estimate: < 5 Minutes Complete this application. We respond to you within 72 hours by email.
1
0
10
@johnnylin
johnny
1 year
3/ Neuronpedia lets you wrangle hundreds of thousands of features with a few clicks. 🤠. Here, we search a custom sentence (via life inference), then sort results by the sum of the activations of two specific tokens, and finally, we filter the results to layer 10 only. Not bad!
1
1
11
@johnnylin
johnny
1 year
2/ Neuronpedia makes interp research both visual and interactive. ✨. Here, we filter for "twitter" features in GPT2, layer 9's residuals. Several matches lights up, and we zoom into a specific cluster. Finally, we save three features to a new list that can be shared publicly.
1
4
22
@johnnylin
johnny
1 year
1/ Introducing Neuronpedia: an open platform for interpretability research with hosting, visualizations, and tooling for Sparse Autoencoders (SAEs). Let's try it out! ➡️. Neuronpedia lets us instantly test activations of SAE features with custom text. Here's a Star Wars feature:
4
29
194
@johnnylin
johnny
2 years
RT @JBloomAus: Super impressed by @johnnylin's Interactive Interface for exploring my GPT2 Small SAE Features. Fi….
0
1
0
@johnnylin
johnny
2 years
best IoT feature: devices that automatically update for daylight savings time.
0
0
0
@johnnylin
johnny
6 years
RT @verge: Openly Operated wants to make privacy policies actually mean something
Tweet media one
0
9
0
@johnnylin
johnny
7 years
twitter encourages logical local optima.
2
0
0
@johnnylin
johnny
9 years
move slowly and break things.
0
0
1