Louis Kirsch @LouisKirschAI X Profile

Louis Kirsch

@LouisKirschAI

Followers

2K

Following

3K

Media

32

Statuses

313

Driving the automation of AI Research. Research Scientist @GoogleDeepMind. PhD @SchmidhuberAI. @UCL, @HPI_DE alumnus. All opinions are my own.

London, England

Joined November 2011

Don't wanna be here? Send us removal request.

Louis Kirsch

@LouisKirschAI

3 years

Emergent in-context learning with Transformers is exciting! But what is necessary to make neural nets implement general-purpose in-context learning? 2^14 tasks, a large model + memory, and initial memorization to aid generalization. Full paper 🧵👇(1/9)

7

83

386

Louis Kirsch

@LouisKirschAI

27 days

RT @LauraRuis: Revisiting @LouisKirschAI et al.’s general-purpose ICL by meta-learning paper and forgot how great it is. It's rare to be ta….

0

7

0

Louis Kirsch

@LouisKirschAI

11 months

AI Scientists will drive the next scientific revolution 🚀.Great work towards automating AI research @_chris_lu_.@RobertTLange @cong_ml @j_foerst @hardmaru @jeffclune.

Sakana AI

@SakanaAILabs

11 months

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery!. From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI

3

4

29

Louis Kirsch

@LouisKirschAI

11 months

RT @FaccioAI: Had a great time presenting at #ICML2024 alongside @idivinci & @LouisKirschAI. But the true highlight was @SchmidhuberAI hims….

0

2

0

Louis Kirsch

@LouisKirschAI

1 year

RT @JacksonMattT: Meta-learning can discover RL algorithms with novel modes of learning, but how can we make them adapt to any training hor….

0

25

0

Louis Kirsch

@LouisKirschAI

2 years

RT @hardmaru: Amazing that @SchmidhuberAI gave this talk back in 2012, months before AlexNet paper was published. In 2012, many things he….

0

110

0

Louis Kirsch

@LouisKirschAI

2 years

RT @agopal42: Excited to present “Contrastive Training of Complex-valued Autoencoders for Object Discovery“ at #NeurIPS2023. TL;DR -- We in….

0

26

0

Louis Kirsch

@LouisKirschAI

2 years

RT @SRSchmidgall: There is still a lot we can learn from the brain in artificial intelligence. In our new review article, we delve into th….

0

47

0

Louis Kirsch

@LouisKirschAI

2 years

RT @DFinsterwalder: But looking deeper into GPT and its capacity for in-context learning (ICL) is fascinating. Recent works on ICL (like th….

0

1

0

Louis Kirsch

@LouisKirschAI

2 years

RT @jakeABeck: Excited to share our new survey paper of meta-RL!.📊🤖🎊. Many thanks to my co-authors for the hard wo….

0

48

0

Louis Kirsch

@LouisKirschAI

2 years

RT @ThomasMiconi: To start the new year (🥳) I'd like to highlight 2 recent papers that ask essentially the same question, but from very di….

0

18

0

Louis Kirsch

@LouisKirschAI

3 years

Video and poster at

0

1

Louis Kirsch

@LouisKirschAI

3 years

This paper is the outcome of an incredibly fun internship with @Luke_Metz @jmes_harrison @jaschasd at @GoogleAI over the summer.

1

0

13

Louis Kirsch

@LouisKirschAI

3 years

Finally, meta-optimization can be tricky. We observed that learning gets stuck on loss plateaus at the beginning of training. Just slightly biasing the training distribution to allow for memorization followed by generalization mitigates this issue. (9/9)

1

0

12

Louis Kirsch

@LouisKirschAI

3 years

Does this only work for Transformers? No! We tried a range of architectures. Compared to scaling laws, the number of parameters does not predict the learning-to-learn ability too well. Instead, what matters is how much memory (or the activation bottleneck) the model has. (8/9)

1

0

13

Louis Kirsch

@LouisKirschAI

3 years

When meta-training on around 2^14 tasks (on the transition boundary) with multiple seeds, at the end of meta-training, the network either implements system identification or general learning-to-learn but rarely solutions in between. (7/9)

1

0

12

Louis Kirsch

@LouisKirschAI

3 years

As we increase the number of tasks we observe several phase transitions: (1) instances are memorized with no learning, (2) tasks from the training are identified, (3) a general learning algorithm is implemented. The final transition is surprisingly discrete. (6/9)

1

3

20

Louis Kirsch

@LouisKirschAI

3 years

When meta-testing the learned models (inference), we observe typical learning behavior - the bigger the training dataset in-context, the better the predictions! This generalizes to datasets not seen during training (eg train on MNIST, test on FashionM, CIFAR10, etc). (5/9)

1

13

Louis Kirsch

@LouisKirschAI

3 years

By conditioning on a task-specific training dataset, models with many parameters trained on lots of tasks exhibit generalization to tasks that are not in the training distribution. Is this zero-shot generalization or learning-to-learn? (4/9)

1

0

11

Louis Kirsch

@LouisKirschAI

3 years

To make black-box models like Transformers generalize we need to train on many tasks. We take existing datasets and generate variants by random transformation - linear projections and label permutations. This allows us to ask fundamental questions about learning-to-learn. (3/9)

1

13

Louis Kirsch

@LouisKirschAI

3 years

General-purpose in-context learners (GPICL) allow us to solve new ML problems without relying on backpropagation and SGD! A model takes in training data, and produces test-set predictions, without any explicit inference model, training loss, or optimization algorithm. (2/9)

1

0

12