
Nick Jiang
@nickhjiang
Followers
604
Following
889
Media
25
Statuses
201
interpreting neural networks @berkeley_ai // cs + philosophy @ucberkeley // prev @briskteaching @watershed
Berkeley, CA
Joined July 2019
RT @nickhjiang: Updated paper!. Our main new finding: by creating attention biases at test time—without extra tokens—we remove high-norm ou….
0
33
0
Updated paper!. Our main new finding: by creating attention biases at test time—without extra tokens—we remove high-norm outliers and attention sinks in ViTs, while preserving zero-shot ImageNet performance. Maybe ViTs don’t need registers after all?.
Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by retraining with “register” tokens, we find the mechanism behind outliers and make registers at ✨test-time✨—giving clean features and better performance! 🧵
3
33
180
RT @nikhil07prakash: How do language models track mental states of each character in a story, often referred to as Theory of Mind?. Our rec….
0
95
0
RT @Tim_Dettmers: Very interesting work. These outliers are the same outliers as in LLM.int8() and the attention sinks papers and suggest t….
0
12
0
RT @soniajoseph_: This is really cool and useful vision work and will solve many of the problems I’ve been having.
0
1
0
RT @nickhjiang: Vision transformers have high-norm outliers that hurt performance and distort attention. While prior work removed them by r….
0
134
0
RT @Yampeleg: One of the most interesting papers to read at the moment with extreme implications also for language transformers.
0
2
0
RT @_AmilDravid: Artifacts in your attention maps? Forgot to train with registers? Use 𝙩𝙚𝙨𝙩-𝙩𝙞𝙢𝙚 𝙧𝙚𝙜𝙞𝙨𝙩𝙚𝙧𝙨! We find a sparse set of activat….
0
60
0
This work was a wonderful collaboration with @_AmilDravid, Alyosha Efros, and @YGandelsman . Check out our paper!
1
5
46