
Andrew Saxe
@SaxeLab
Followers
5K
Following
2K
Media
82
Statuses
697
Prof at @GatsbyUCL and @SWC_Neuro, trying to figure out how we learn. Bluesky: @SaxeLab Mastodon: @[email protected]
London, UK
Joined November 2019
How does in-context learning emerge in attention models during gradient descent training? . Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention . Led by Yedi Zhang with @Aaditya6284 and Peter Latham
2
22
110
RT @Aaditya6284: Excited to share this work has been accepted as an Oral at #icml2025 -- looking forward to seeing everyone in Vancouver, a….
0
5
0
RT @Aaditya6284: Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why?. Excited to….
0
23
0
RT @Aaditya6284: Was super fun to be a part of this work! Felt very satisfying to bring the theory work on ICL with linear attention a bit….
0
5
0
RT @sebastiangoldt: Really happy to see this paper out, led by @nishpathead in collaboration with @stefsmlab and @SaxeLab: we apply the sta….
0
7
0
RT @stefsmlab: Our paper just came out in PRX! . Congrats to @nishpathead and the rest of the team. TL;DR : We analyse neural network lear….
0
3
0
RT @GabyMohamady: How do cognitive maps fail? And how can this help us understand/treat psychosis? My lab at Experimental Psychology, Oxfor….
0
15
0
RT @sebastiangoldt: If I had known about this master when I was coming out of my Bachelor, I would have applied in a heartbeat, so please h….
0
5
0
RT @devonjarvi5: Our paper, “Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks” will b….
0
16
0
RT @zdeborova: Happy to share the recording of my plenary talk at Cosyne 2025 two days ago. You will learn about the statistical physics ap….
0
21
0
RT @ClementineDomi6: Our paper, “A Theory of Initialization’s Impact on Specialization,” has been accepted to ICLR 2025!..
0
20
0
RT @BlavatnikAwards: 2025 @Blavatnikawards UK 🇬🇧 Finalist Andrew Saxe from UCL was featured on the @BBC Science Focus Instant Genius Podcas….
0
9
0
New paper with @leonlufkin and @ermgrant! . Why do we see localized receptive fields so often, even in models without sparisity regularization?. We present a theory in the minimal setting from @ai_ngrosso and @sebastiangoldt.
We’re excited to share our paper analyzing how data drives the emergence of localized receptive fields in neural networks! w/ @SaxeLab @ermgrant. Come see our #NeurIPS2024 spotlight poster today at 4:30–7:30 in the East Hall!. Paper:
0
14
87