SaxeLab Profile Banner
Andrew Saxe Profile
Andrew Saxe

@SaxeLab

Followers
5K
Following
2K
Media
89
Statuses
715

Prof at @GatsbyUCL and @SWC_Neuro, trying to figure out how we learn. Bluesky: @SaxeLab Mastodon: @[email protected]

London, UK
Joined November 2019
Don't wanna be here? Send us removal request.
@SaxeLab
Andrew Saxe
5 months
How does in-context learning emerge in attention models during gradient descent training? Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention https://t.co/rr8T8ww5Kt Led by Yedi Zhang with @Aaditya6284 and Peter Latham
3
22
122
@GatsbyUCL
Gatsby Computational Neuroscience Unit
3 months
📢 Job alert We are looking for a Postdoctoral Fellow to work with @ArthurGretton on creating statistically efficient causal and interaction models with the aim of elucidating cellular interactions. ⏰Deadline 27-Aug-2025 ℹ️
Tweet card summary image
ucl.ac.uk
UCL is consistently ranked as one of the top ten universities in the world (QS World University Rankings 2010-2022) and is No.2 in the UK for research power (Research Excellence Framework 2021).
2
5
30
@ClementineDomi6
Clémentine Dominé, Phd 🍊
3 months
🎓Thrilled to share I’ve officially defended my PhD!🥳 At @GatsbyUCL, my research explored how prior knowledge shapes neural representations. I’m deeply grateful to my mentors, @SaxeLab and Caswell Barry, my incredible collaborators, and everyone who supported me! Stay tuned!
13
11
356
@ninamiolane
Nina Miolane 🦋 @ninamiolane.bsky.social
4 months
If you’re working on symmetry and geometry in neural representations, submit your work to NeurReps and join the community in San Diego ! 🤩 Deadline August 22nd.
@neur_reps
Symmetry and Geometry in Neural Representations
4 months
Are you studying how structure shapes computation in the brain and in AI systems? 🧠 Come share your work in San Diego at NeurReps 2025! There is one month left until the submission deadline on August 22: https://t.co/A3jYmImrLf
1
10
68
@elboustanilab
Sami El-Boustani
5 months
If you can see it, you can feel it! Thrilled to share our new @NatureComms paper on how mice generalize spatial rules between vision & touch, led by brilliant co-first authors @giulio_matt & @GtnMaelle. More details in this thread 🧵 (1/7) https://t.co/CvmDQqokiF
1
9
56
@GatsbyUCL
Gatsby Computational Neuroscience Unit
4 months
🥳 Congratulations to Rodrigo Carrasco-Davison on passing his PhD viva with minor corrections! 🎉 📜 Principles of Optimal Learning Control in Biological and Artificial Agents.
4
1
60
@SaxeLab
Andrew Saxe
4 months
Come chat about this at the poster @icmlconf, 11:00-13:30 on Wednesday in the West Exhibition Hall #W-902!
@SaxeLab
Andrew Saxe
5 months
How does in-context learning emerge in attention models during gradient descent training? Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention https://t.co/rr8T8ww5Kt Led by Yedi Zhang with @Aaditya6284 and Peter Latham
0
4
15
@GatsbyUCL
Gatsby Computational Neuroscience Unit
4 months
👋 Attending #ICML2025 next week? Don't forget to check out work involving our researchers!
1
5
29
@Aaditya6284
Aaditya Singh
4 months
Excited to present this work in Vancouver at #ICML2025 today 😀 Come by to hear about why in-context learning emerges and disappears: Talk: 10:30-10:45am, West Ballroom C Poster: 11am-1:30pm, East Exhibition Hall A-B # E-3409
@Aaditya6284
Aaditya Singh
8 months
Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬
1
5
21
@SaxeLab
Andrew Saxe
4 months
With this work we hope to shed light on some of the universal aspects of representation learning in neural networks, and one way networks can generalize infinitely from finite experience. (11/11)
0
0
8
@SaxeLab
Andrew Saxe
4 months
Not all pairs of representations which can merge are actually expected to merge. Thus, the final learned automaton may look different for different training runs, even when in practice they implement an identical algorithm. (10/11)
2
0
6
@SaxeLab
Andrew Saxe
4 months
The theory predicts mergers can only occur given enough training data and small enough initial weights, resulting in a phase transition between an overfitting regime and an algorithm-learning regime. (9/11)
1
0
5
@SaxeLab
Andrew Saxe
4 months
Since these pairs share outputs, mergers do not affect the automaton's computation. With enough mergers, the automaton becomes finite, fixing its behavior for long sequences. If the training data uniquely specifies the task, this results in full generalization. (8/11)
1
0
3
@SaxeLab
Andrew Saxe
4 months
Using intuitions based on continuity, we derive local interactions between pairs of representations. We find that pairs of sequences which always agree on target outputs after receiving any possible additional symbols will merge representations under certain conditions. (7/11)
1
0
3
@SaxeLab
Andrew Saxe
4 months
We find two phases: -An initial phase where the RNN builds an infinite tree and fits it to the training data, reducing only the training loss. -A second merging phase, where representations merge until the automaton becomes finite, with a sudden drop in validation loss. (6/11)
1
0
4
@SaxeLab
Andrew Saxe
4 months
To understand what is happening in the RNN, we extract automata from its hidden representations during training, which visualize the computational algorithm as it is being developed. (5/11)
2
1
10
@SaxeLab
Andrew Saxe
4 months
When training only on sequences up to length 10, we find complete generalization for any possible sequence length. This cannot be explained by smooth interpolation of the training data, and suggests some kind of algorithm is being learned. (4/11)
1
0
5
@SaxeLab
Andrew Saxe
4 months
To explore this, we consider the simplest possible setting in which a neural network implicitly learns an algorithm. We train a recurrent neural network on sequences of ones and zeros to predict if the number of ones is even or odd. (3/11)
1
0
3
@SaxeLab
Andrew Saxe
4 months
How can the continuous gradient descent dynamics of deep neural networks result in the development of a discrete algorithm capable of symbolic computation? (2/11)
1
0
3
@SaxeLab
Andrew Saxe
4 months
Excited to share new work @icmlconf by Loek van Rossem exploring the development of computational algorithms in recurrent neural networks. Hear it live tomorrow, Oral 1D, Tues 14 Jul West Exhibition Hall C: https://t.co/zsnSlJ0rrc Paper: https://t.co/aZs7VZuFNg (1/11)
openreview.net
Even when massively overparameterized, deep neural networks show a remarkable ability to generalize. Research on this phenomenon has focused on generalization within distribution, via smooth...
2
20
69