LouisKirschAI Profile Banner
Louis Kirsch Profile
Louis Kirsch

@LouisKirschAI

Followers
2K
Following
3K
Media
32
Statuses
313

Driving the automation of AI Research. Research Scientist @GoogleDeepMind. PhD @SchmidhuberAI. @UCL, @HPI_DE alumnus. All opinions are my own.

London, England
Joined November 2011
Don't wanna be here? Send us removal request.
@LouisKirschAI
Louis Kirsch
3 years
Emergent in-context learning with Transformers is exciting! But what is necessary to make neural nets implement general-purpose in-context learning? 2^14 tasks, a large model + memory, and initial memorization to aid generalization. Full paper 🧵👇(1/9)
Tweet media one
7
83
386
@LouisKirschAI
Louis Kirsch
27 days
RT @LauraRuis: Revisiting @LouisKirschAI et al.’s general-purpose ICL by meta-learning paper and forgot how great it is. It's rare to be ta….
0
7
0
@LouisKirschAI
Louis Kirsch
11 months
AI Scientists will drive the next scientific revolution 🚀.Great work towards automating AI research @_chris_lu_.@RobertTLange @cong_ml @j_foerst @hardmaru @jeffclune.
@SakanaAILabs
Sakana AI
11 months
Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery!. From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
4
29
@LouisKirschAI
Louis Kirsch
11 months
RT @FaccioAI: Had a great time presenting at #ICML2024 alongside @idivinci & @LouisKirschAI. But the true highlight was @SchmidhuberAI hims….
0
2
0
@LouisKirschAI
Louis Kirsch
1 year
RT @JacksonMattT: Meta-learning can discover RL algorithms with novel modes of learning, but how can we make them adapt to any training hor….
0
25
0
@LouisKirschAI
Louis Kirsch
2 years
RT @hardmaru: Amazing that @SchmidhuberAI gave this talk back in 2012, months before AlexNet paper was published. In 2012, many things he….
0
110
0
@LouisKirschAI
Louis Kirsch
2 years
RT @agopal42: Excited to present “Contrastive Training of Complex-valued Autoencoders for Object Discovery“ at #NeurIPS2023. TL;DR -- We in….
0
26
0
@LouisKirschAI
Louis Kirsch
2 years
RT @SRSchmidgall: There is still a lot we can learn from the brain in artificial intelligence. In our new review article, we delve into th….
0
47
0
@LouisKirschAI
Louis Kirsch
2 years
RT @DFinsterwalder: But looking deeper into GPT and its capacity for in-context learning (ICL) is fascinating. Recent works on ICL (like th….
0
1
0
@LouisKirschAI
Louis Kirsch
2 years
RT @jakeABeck: Excited to share our new survey paper of meta-RL!.📊🤖🎊. Many thanks to my co-authors for the hard wo….
0
48
0
@LouisKirschAI
Louis Kirsch
2 years
RT @ThomasMiconi: To start the new year (🥳) I'd like to highlight 2 recent papers that ask essentially the same question, but from very di….
0
18
0
@LouisKirschAI
Louis Kirsch
3 years
Video and poster at
0
0
1
@LouisKirschAI
Louis Kirsch
3 years
This paper is the outcome of an incredibly fun internship with @Luke_Metz @jmes_harrison @jaschasd at @GoogleAI over the summer.
Tweet media one
1
0
13
@LouisKirschAI
Louis Kirsch
3 years
Finally, meta-optimization can be tricky. We observed that learning gets stuck on loss plateaus at the beginning of training. Just slightly biasing the training distribution to allow for memorization followed by generalization mitigates this issue. (9/9)
Tweet media one
1
0
12
@LouisKirschAI
Louis Kirsch
3 years
Does this only work for Transformers? No! We tried a range of architectures. Compared to scaling laws, the number of parameters does not predict the learning-to-learn ability too well. Instead, what matters is how much memory (or the activation bottleneck) the model has. (8/9)
Tweet media one
1
0
13
@LouisKirschAI
Louis Kirsch
3 years
When meta-training on around 2^14 tasks (on the transition boundary) with multiple seeds, at the end of meta-training, the network either implements system identification or general learning-to-learn but rarely solutions in between. (7/9)
Tweet media one
1
0
12
@LouisKirschAI
Louis Kirsch
3 years
As we increase the number of tasks we observe several phase transitions: (1) instances are memorized with no learning, (2) tasks from the training are identified, (3) a general learning algorithm is implemented. The final transition is surprisingly discrete. (6/9)
Tweet media one
1
3
20
@LouisKirschAI
Louis Kirsch
3 years
When meta-testing the learned models (inference), we observe typical learning behavior - the bigger the training dataset in-context, the better the predictions! This generalizes to datasets not seen during training (eg train on MNIST, test on FashionM, CIFAR10, etc). (5/9)
Tweet media one
1
1
13
@LouisKirschAI
Louis Kirsch
3 years
By conditioning on a task-specific training dataset, models with many parameters trained on lots of tasks exhibit generalization to tasks that are not in the training distribution. Is this zero-shot generalization or learning-to-learn? (4/9)
Tweet media one
1
0
11
@LouisKirschAI
Louis Kirsch
3 years
To make black-box models like Transformers generalize we need to train on many tasks. We take existing datasets and generate variants by random transformation - linear projections and label permutations. This allows us to ask fundamental questions about learning-to-learn. (3/9)
Tweet media one
1
1
13
@LouisKirschAI
Louis Kirsch
3 years
General-purpose in-context learners (GPICL) allow us to solve new ML problems without relying on backpropagation and SGD! A model takes in training data, and produces test-set predictions, without any explicit inference model, training loss, or optimization algorithm. (2/9)
Tweet media one
1
0
12