Timon Willi Profile
Timon Willi

@TimonWilli

Followers
336
Following
1K
Media
27
Statuses
160

RS @AIatMeta, DPhil w/ @j_foerst, @UniofOxford; Formerly: Research Intern @GoogleDeepMind / PhD @VectorInst / RS at @nnaisense / MSc w/ @SchmidhuberAI

London, United Kingdom
Joined May 2022
Don't wanna be here? Send us removal request.
@TimonWilli
Timon Willi
2 years
Scaling laws for (self) supervised learning predict: Increase parameter count -> performance goes brrrr. (loosely speaking) Can we get scaling laws for Deep Reinforcement Learning? In this work, we pave the way towards scaling laws for Deep Reinforcement Learning. We show that
@pcastr
Pablo Samuel Castro
2 years
📢Mixtures of Experts unlock parameter scaling for deep RL! Adding MoEs, and in particular Soft MoEs, to value-based deep RL agents results in more parameter-scalable models. Performance keeps increasing as we increase number of experts (green line below)! 1/9
3
10
39
@FLAIR_Ox
Foerster Lab for AI Research
1 month
🚨🚨Introducing the FLAIR internship program!🚨🚨 We are looking for two talented students to join us for an internship working in FLAIR for 6 months (5th January to 4th July 2026)! For details and eligibility criteria, please check:
foersterlab.com
We are looking for two talented students to join us for an internship working in FLAIR for 6 months. Students will get the chance to work on current FLAIR projects at the University of Oxford,...
2
21
119
@joaoabrantis
abranti
2 months
In an evolving population of models, using model merging as the crossover operation drastically reduces diversity and leads to premature convergence. To address this, we make models compete for limited resources (training datapoints) which benefit models that have unique skills
@SakanaAILabs
Sakana AI
2 months
What if we could evolve AI models like organisms in nature, letting them compete, mate, and combine their strengths to produce ever-fitter offspring? Excited to share our new work: “Competition and Attraction Improve Model Fusion” presented at GECCO’25🦎 where it was a runner-up
1
2
15
@_chris_lu_
Chris Lu
2 months
@akbirkhan @TimonWilli @TimonWilli deserves far more crap for staying in Europe
2
1
5
@j_foerst
Jakob Foerster
3 months
I recently had a lunch time conversation with a very senior AI researcher about how are multi-agent problems differ from single agent (their starting point was they do not). One point that made them think: As computers scale, the rest of the world (i.e. no agentic parts) is not
24
17
233
@uljadb99
Uljad @ NeurIPS
3 months
Unlock real diversity in your LLM! 🚀 LLM outputs can be boring and repetitive. Today, we release Intent Factored Generation (IFG) to: - Sample conceptually diverse outputs💡 - Improve performance on math and code reasoning tasks🤔 - Get more engaging conversational agents 🤖
1
11
36
@johanobandoc
Johan Obando-Ceron 👍🏽
3 months
🚨 Excited to share our #ICML2025 paper: "The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep RL" We train RL agents to know when to quit, cutting wasted effort and improving efficiency with our method LEAST. 📄Paper: https://t.co/9ED3FubIPc 🧵Check the thread below👇🏾
@pcastr
Pablo Samuel Castro
3 months
Thrilled to share our #ICML2025 paper “The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep RL”, led by Jiashun Liu and with other great collaborators! We teach RL agents when to quit wasting effort, boosting efficiency with our proposed method LEAST. Here's the story 🧵👇🏾
3
19
126
@SchmidhuberAI
Jürgen Schmidhuber
3 months
Since 1990, we have worked on artificial curiosity & measuring „interestingness.“ Our new ICML paper uses "Prediction of Hidden Units" loss to quantify in-context computational complexity in sequence models. It can tell boring from interesting tasks and predict correct reasoning.
@idivinci
Vincent Herrmann
3 months
Excited to share our new ICML paper, with co-authors @robert_csordas and @SchmidhuberAI! How can we tell if an LLM is actually "thinking" versus just spitting out memorized or trivial text? Can we detect when a model is doing anything interesting? (Thread below👇)
11
64
365
@TimonWilli
Timon Willi
4 months
finally, an Opponent Shaping application I don’t have to make up for the intro section. dreams do come true.
@OlaKalisz8
Ola Kalisz
4 months
Antiviral therapy design is myopic 🦠🙈 optimised only for the current strain. That's why you need a different Flu vaccine every year! Our #ICML2025 paper ADIOS proposes "shaper therapies" that steer viral evolution in our favour & remain effective. Work done @FLAIR_Ox 🧵👇
0
1
10
@OlaKalisz8
Ola Kalisz
4 months
Antiviral therapy design is myopic 🦠🙈 optimised only for the current strain. That's why you need a different Flu vaccine every year! Our #ICML2025 paper ADIOS proposes "shaper therapies" that steer viral evolution in our favour & remain effective. Work done @FLAIR_Ox 🧵👇
2
20
54
@TimonWilli
Timon Willi
7 months
Tried to solve science, but solved humor instead. That’s why greatness cannot be planned.
@SakanaAILabs
Sakana AI
7 months
The AI Scientist is far from perfect. Occasionally it makes embarrassing citation errors. Here, it incorrectly attributed “an LSTM-based neural network” to Goodfellow (2016) rather than to the correct authors, Hochreiter & Schmidhuber (1997). We documented these errors in own
1
0
28
@TimonWilli
Timon Willi
7 months
congrats to the team!
@SakanaAILabs
Sakana AI
7 months
The AI Scientist Generates its First Peer-Reviewed Scientific Publication We’re proud to announce that a paper produced by The AI Scientist-v2 passed the peer-review process at a workshop in ICLR, a top AI conference. Read more about this experiment → https://t.co/LpLYLnZMCQ
0
0
1
@akbirkhan
akbir.
7 months
In the spirit of making more real world evals, here is the Factorio Learning Environment (FLE). Spurred by wanting to eval if models are good paperclip maximisers, we check how well agents build factories for other things 🏗️🏭🛠️
31
100
1K
@RobertTLange
Robert Lange
1 year
🎉 Stoked to share The AI-Scientist 🧑‍🔬 - our end-to-end approach for conducting research with LLMs including ideation, coding, experiment execution, paper write-up & reviewing. Blog 📰: https://t.co/kBwAgvXDjZ Paper 📜: https://t.co/XvkwWfQhyi Code 💻: https://t.co/hXlXjxFAD9
@SakanaAILabs
Sakana AI
1 year
Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! https://t.co/8wVqIXVpZJ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI
14
68
366
@BrantonDeMoss
Branton DeMoss
10 months
I’m pleased to announce our work which studies complexity phase transitions in neural networks! We track the Kolmogorov complexity of networks as they “grok”, and find a characteristic rise and fall of complexity, corresponding to memorization followed by generalization. 🧵
31
156
1K
@FLAIR_Ox
Foerster Lab for AI Research
10 months
🧑‍🔬 FLAIR is presenting three more great papers today at #NeurIPS2024! Come talk to us and find out what we've been doing!
1
5
20
@FLAIR_Ox
Foerster Lab for AI Research
10 months
🔬 FLAIR has a bunch of great papers being presented today at NeurIPS! Come along to learn more about the work! 👀 Keep your eyes peeled for more work being presented over the week!
1
5
13
@TimonWilli
Timon Willi
11 months
There's light at the end of the tunnel of LLM evals: The light at the end of the tunnel:
@PaglieriDavide
Davide Paglieri
11 months
Tired of saturated benchmarks? Want scope for a significant leap in capabilities? 🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games! BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come. 1/🧵
0
0
11
@PaglieriDavide
Davide Paglieri
11 months
Tired of saturated benchmarks? Want scope for a significant leap in capabilities? 🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games! BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come. 1/🧵
8
43
220
@mitrma
Michael Matthews
11 months
We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵
14
214
1K
@JonnyCoook
Jonny Cook
1 year
What if improving LLM evaluation and generation was as simple as using a checklist? Introducing TICK ✅ (Targeted Instruct-evaluation with ChecKlists) and STICK 🏒 (Self-TICK) Work done @cohere with supervision from @_rockt, @j_foerst, @d_aumiller & @W4ngatang. 1/n
4
12
55