Andrew Saxe @SaxeLab X Profile

Andrew Saxe

@SaxeLab

Followers

5K

Following

2K

Media

89

Statuses

715

Prof at @GatsbyUCL and @SWC_Neuro, trying to figure out how we learn. Bluesky: @SaxeLab Mastodon: @[email protected]

https://t.co/mKJo8Ea4AB

London, UK

Joined November 2019

Don't wanna be here? Send us removal request.

Andrew Saxe

@SaxeLab

5 months

How does in-context learning emerge in attention models during gradient descent training? Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention https://t.co/rr8T8ww5Kt Led by Yedi Zhang with @Aaditya6284 and Peter Latham

3

22

122

Gatsby Computational Neuroscience Unit

@GatsbyUCL

3 months

📢 Job alert We are looking for a Postdoctoral Fellow to work with @ArthurGretton on creating statistically efficient causal and interaction models with the aim of elucidating cellular interactions. ⏰Deadline 27-Aug-2025 ℹ️

ucl.ac.uk

UCL is consistently ranked as one of the top ten universities in the world (QS World University Rankings 2010-2022) and is No.2 in the UK for research power (Research Excellence Framework 2021).

2

5

30

Clémentine Dominé, Phd 🍊

@ClementineDomi6

3 months

🎓Thrilled to share I’ve officially defended my PhD!🥳 At @GatsbyUCL, my research explored how prior knowledge shapes neural representations. I’m deeply grateful to my mentors, @SaxeLab and Caswell Barry, my incredible collaborators, and everyone who supported me! Stay tuned!

13

11

356

Nina Miolane 🦋 @ninamiolane.bsky.social

@ninamiolane

4 months

If you’re working on symmetry and geometry in neural representations, submit your work to NeurReps and join the community in San Diego ! 🤩 Deadline August 22nd.

Symmetry and Geometry in Neural Representations

@neur_reps

4 months

Are you studying how structure shapes computation in the brain and in AI systems? 🧠 Come share your work in San Diego at NeurReps 2025! There is one month left until the submission deadline on August 22: https://t.co/A3jYmImrLf

1

10

68

Sami El-Boustani

@elboustanilab

5 months

If you can see it, you can feel it! Thrilled to share our new @NatureComms paper on how mice generalize spatial rules between vision & touch, led by brilliant co-first authors @giulio_matt & @GtnMaelle. More details in this thread 🧵 (1/7) https://t.co/CvmDQqokiF

1

9

56

Gatsby Computational Neuroscience Unit

@GatsbyUCL

4 months

🥳 Congratulations to Rodrigo Carrasco-Davison on passing his PhD viva with minor corrections! 🎉 📜 Principles of Optimal Learning Control in Biological and Artificial Agents.

4

1

60

Andrew Saxe

@SaxeLab

4 months

Come chat about this at the poster @icmlconf, 11:00-13:30 on Wednesday in the West Exhibition Hall #W-902!

Andrew Saxe

@SaxeLab

5 months

How does in-context learning emerge in attention models during gradient descent training? Sharing our new Spotlight paper @icmlconf: Training Dynamics of In-Context Learning in Linear Attention https://t.co/rr8T8ww5Kt Led by Yedi Zhang with @Aaditya6284 and Peter Latham

0

4

15

Gatsby Computational Neuroscience Unit

@GatsbyUCL

4 months

👋 Attending #ICML2025 next week? Don't forget to check out work involving our researchers!

1

5

29

Aaditya Singh

@Aaditya6284

4 months

Excited to present this work in Vancouver at #ICML2025 today 😀 Come by to hear about why in-context learning emerges and disappears: Talk: 10:30-10:45am, West Ballroom C Poster: 11am-1:30pm, East Exhibition Hall A-B # E-3409

Aaditya Singh

@Aaditya6284

8 months

Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬

1

5

21

Andrew Saxe

@SaxeLab

4 months

With this work we hope to shed light on some of the universal aspects of representation learning in neural networks, and one way networks can generalize infinitely from finite experience. (11/11)

0

8

Andrew Saxe

@SaxeLab

4 months

Not all pairs of representations which can merge are actually expected to merge. Thus, the final learned automaton may look different for different training runs, even when in practice they implement an identical algorithm. (10/11)

2

0

6

Andrew Saxe

@SaxeLab

4 months

The theory predicts mergers can only occur given enough training data and small enough initial weights, resulting in a phase transition between an overfitting regime and an algorithm-learning regime. (9/11)

1

0

5

Andrew Saxe

@SaxeLab

4 months

Since these pairs share outputs, mergers do not affect the automaton's computation. With enough mergers, the automaton becomes finite, fixing its behavior for long sequences. If the training data uniquely specifies the task, this results in full generalization. (8/11)

1

0

3

Andrew Saxe

@SaxeLab

4 months

Using intuitions based on continuity, we derive local interactions between pairs of representations. We find that pairs of sequences which always agree on target outputs after receiving any possible additional symbols will merge representations under certain conditions. (7/11)

1

0

3

Andrew Saxe

@SaxeLab

4 months

We find two phases: -An initial phase where the RNN builds an infinite tree and fits it to the training data, reducing only the training loss. -A second merging phase, where representations merge until the automaton becomes finite, with a sudden drop in validation loss. (6/11)

1

0

4

Andrew Saxe

@SaxeLab

4 months

To understand what is happening in the RNN, we extract automata from its hidden representations during training, which visualize the computational algorithm as it is being developed. (5/11)

2

1

10

Andrew Saxe

@SaxeLab

4 months

When training only on sequences up to length 10, we find complete generalization for any possible sequence length. This cannot be explained by smooth interpolation of the training data, and suggests some kind of algorithm is being learned. (4/11)

1

0

5

Andrew Saxe

@SaxeLab

4 months

To explore this, we consider the simplest possible setting in which a neural network implicitly learns an algorithm. We train a recurrent neural network on sequences of ones and zeros to predict if the number of ones is even or odd. (3/11)

1

0

3

Andrew Saxe

@SaxeLab

4 months

How can the continuous gradient descent dynamics of deep neural networks result in the development of a discrete algorithm capable of symbolic computation? (2/11)

1

0

3

Andrew Saxe

@SaxeLab

4 months

Excited to share new work @icmlconf by Loek van Rossem exploring the development of computational algorithms in recurrent neural networks. Hear it live tomorrow, Oral 1D, Tues 14 Jul West Exhibition Hall C: https://t.co/zsnSlJ0rrc Paper: https://t.co/aZs7VZuFNg (1/11)

openreview.net

Even when massively overparameterized, deep neural networks show a remarkable ability to generalize. Research on this phenomenon has focused on generalization within distribution, via smooth...

2

20

69

Alexandra Proca

@a_proca

5 months

How do task dynamics impact learning in networks with internal dynamics? Excited to share our ICML Oral paper on learning dynamics in linear RNNs! with @ClementineDomi6 @mpshanahan @PedroMediano https://t.co/vh7AImrrtn

openreview.net

Recurrent neural networks (RNNs) are powerful models used widely in both machine learning and neuroscience to learn tasks with temporal dependencies and to model neural dynamics. However, despite...

4

18

97