nickhjiang Profile Banner
Nick Jiang Profile
Nick Jiang

@nickhjiang

Followers
604
Following
889
Media
25
Statuses
201

interpreting neural networks @berkeley_ai // cs + philosophy @ucberkeley // prev @briskteaching @watershed

Berkeley, CA
Joined July 2019
Don't wanna be here? Send us removal request.
@nickhjiang
Nick Jiang
24 days
Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵
Tweet media one
15
134
994
@nickhjiang
Nick Jiang
4 days
RT @nickhjiang: Updated paper!. Our main new finding: by creating attention biases at test time—without extra tokens—we remove high-norm ou….
0
33
0
@nickhjiang
Nick Jiang
4 days
Additionally, we release a LLaVA-Llama 8b model (CLIP-L encoder) configured with a test-time register. We have also updated references and added new experiments to the appendix!. LLaVA-Llama 8b: Paper: Repo:
0
0
4
@nickhjiang
Nick Jiang
4 days
These findings are preliminary and only done on a base OpenCLIP model, where outliers are small (norm < 500). If they extend to the language domain, however, they offer a promising way to manage outliers—a challenge for quantization—without engineering hacks. See Appendix A.11.
1
0
1
@nickhjiang
Nick Jiang
4 days
Zeroing out the activations of register neurons removes outliers and drops classification performance by ~20%, but adding an attention bias recovers this drop. Our results suggest that outliers are primarily attention biases in ViTs.
Tweet media one
1
0
2
@nickhjiang
Nick Jiang
4 days
Sun et al (”Massive Activations”) proposed adding an attention bias during training to mitigate high-norm outliers. But we can do this training-free! Specifically, for each attention head, we set v’ and k’ to the value and key vectors of a test-time register averaged over images.
Tweet media one
2
0
2
@nickhjiang
Nick Jiang
4 days
Updated paper!. Our main new finding: by creating attention biases at test time—without extra tokens—we remove high-norm outliers and attention sinks in ViTs, while preserving zero-shot ImageNet performance. Maybe ViTs don’t need registers after all?.
@nickhjiang
Nick Jiang
24 days
Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵
Tweet media one
3
33
180
@nickhjiang
Nick Jiang
9 days
RT @nikhil07prakash: How do language models track mental states of each character in a story, often referred to as Theory of Mind?. Our rec….
0
95
0
@nickhjiang
Nick Jiang
22 days
RT @Tim_Dettmers: Very interesting work. These outliers are the same outliers as in LLM.int8() and the attention sinks papers and suggest t….
0
12
0
@nickhjiang
Nick Jiang
23 days
RT @soniajoseph_: This is really cool and useful vision work and will solve many of the problems I’ve been having.
0
1
0
@nickhjiang
Nick Jiang
23 days
RT @nickhjiang: Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by r….
0
134
0
@nickhjiang
Nick Jiang
24 days
RT @Yampeleg: One of the most interesting papers to read at the moment with extreme implications also for language transformers.
0
2
0
@nickhjiang
Nick Jiang
24 days
RT @_AmilDravid: Artifacts in your attention maps? Forgot to train with registers? Use 𝙩𝙚𝙨𝙩-𝙩𝙞𝙢𝙚 𝙧𝙚𝙜𝙞𝙨𝙩𝙚𝙧𝙨! We find a sparse set of activat….
0
60
0
@nickhjiang
Nick Jiang
24 days
This work was a wonderful collaboration with @_AmilDravid, Alyosha Efros, and @YGandelsman . Check out our paper!
1
5
46
@nickhjiang
Nick Jiang
24 days
You can try these models out! We have instructions on how to load OpenCLIP and DINOv2 versions with test-time registers.
1
2
40
@nickhjiang
Nick Jiang
24 days
Whereas interpretability works tend to explore the semantic role of neurons (concepts like “dog”), we highlight here neurons whose function is image-independent and wouldn’t be properly understood from purely looking at the input space.
1
1
18
@nickhjiang
Nick Jiang
24 days
Beyond test-time registers, we use register neurons to mask areas of an image. CLIP is biased toward text, making it vulnerable to “typographic attacks”. Using register neurons, we move outliers onto text within an image, reducing attack success from 50% → 7%.
Tweet media one
1
1
19
@nickhjiang
Nick Jiang
24 days
Vision-language models are built on top of ViTs and inherit their outlier patches, which leak into the language backbone’s attention. Using test-time registers, we remove the ViT-originated outliers from LLaVA-Llama-3-8b and get interpretable attention maps useful for debugging.
Tweet media one
1
2
28
@nickhjiang
Nick Jiang
24 days
Test-time registers enhance or maintain base performance on dense prediction, segmentation, and depth perception. On unsupervised object discovery, we use LOST and reach near-parity with models retrained using registers (+20% correct localization on DINOv2).
Tweet media one
1
2
23
@nickhjiang
Nick Jiang
24 days
To mimic trained registers, we add an extra token at test-time and move the outliers to this token with register neurons. This “test-time register” absorbs the outliers and produces attention maps that match in quality to models with trained registers.
Tweet media one
1
2
32
@nickhjiang
Nick Jiang
24 days
First, we study how outliers emerge and identify a small set of “register neurons” with high, sparse activations on outliers. These neurons causally set outlier locations: by intervening on their activations, we can make outliers appear in patterns like hearts or smiles.
1
3
27