
Nick Jiang
@nickhjiang
Followers
750
Following
1K
Media
33
Statuses
224
interpreting neural networks @berkeley_ai // cs + philosophy @ucberkeley // prev @briskteaching @watershed
Berkeley, CA
Joined July 2019
What makes LLMs like Grok-4 unique?. We use sparse autoencoders (SAEs) to tackle queries like these and apply them to four data analysis tasks: data diffing, correlations, targeted clustering, and retrieval. By analyzing model outputs, SAEs find novel insights on model behavior!
6
16
156
RT @nickhjiang: What makes LLMs like Grok-4 unique?. We use sparse autoencoders (SAEs) to tackle queries like these and apply them to four….
0
16
0
RT @BaldassarreFe: Say hello to DINOv3 🦖🦖🦖. A major release that raises the bar of self-supervised vision foundation models. With stunning….
0
277
0
RT @NeelNanda5: I'm excited about our vision of data-centric interpretability! Even if you can't use a model's internals, there's a lot of….
0
9
0
Work done with @lilysun004*, Lewis Smith, and @NeelNanda5. Thank you to @GoodfireAI and MATS for compute support!. Blog post:
lesswrong.com
Nick and Lily are co-first authors on this project. Lewis and Neel jointly supervised this project. …
0
0
10
RT @nickhjiang: Updated paper!. Our main new finding: by creating attention biases at test time—without extra tokens—we remove high-norm ou….
0
35
0
Additionally, we release a LLaVA-Llama 8b model (CLIP-L encoder) configured with a test-time register. We have also updated references and added new experiments to the appendix!. LLaVA-Llama 8b: Paper: Repo:
huggingface.co
1
0
5