Timon Willi @TimonWilli X Profile

Timon Willi

@TimonWilli

Followers

336

Following

1K

Media

27

Statuses

160

RS @AIatMeta, DPhil w/ @j_foerst, @UniofOxford; Formerly: Research Intern @GoogleDeepMind / PhD @VectorInst / RS at @nnaisense / MSc w/ @SchmidhuberAI

https://t.co/G8no1tVsUO

London, United Kingdom

Joined May 2022

Don't wanna be here? Send us removal request.

Timon Willi

@TimonWilli

2 years

Scaling laws for (self) supervised learning predict: Increase parameter count -> performance goes brrrr. (loosely speaking) Can we get scaling laws for Deep Reinforcement Learning? In this work, we pave the way towards scaling laws for Deep Reinforcement Learning. We show that

Pablo Samuel Castro

@pcastr

2 years

📢Mixtures of Experts unlock parameter scaling for deep RL! Adding MoEs, and in particular Soft MoEs, to value-based deep RL agents results in more parameter-scalable models. Performance keeps increasing as we increase number of experts (green line below)! 1/9

3

10

39

Foerster Lab for AI Research

@FLAIR_Ox

1 month

🚨🚨Introducing the FLAIR internship program!🚨🚨 We are looking for two talented students to join us for an internship working in FLAIR for 6 months (5th January to 4th July 2026)! For details and eligibility criteria, please check:

foersterlab.com

We are looking for two talented students to join us for an internship working in FLAIR for 6 months. Students will get the chance to work on current FLAIR projects at the University of Oxford,...

2

21

119

abranti

@joaoabrantis

2 months

In an evolving population of models, using model merging as the crossover operation drastically reduces diversity and leads to premature convergence. To address this, we make models compete for limited resources (training datapoints) which benefit models that have unique skills

Sakana AI

@SakanaAILabs

2 months

What if we could evolve AI models like organisms in nature, letting them compete, mate, and combine their strengths to produce ever-fitter offspring? Excited to share our new work: “Competition and Attraction Improve Model Fusion” presented at GECCO’25🦎 where it was a runner-up

1

2

15

Chris Lu

@_chris_lu_

2 months

@akbirkhan @TimonWilli @TimonWilli deserves far more crap for staying in Europe

2

1

5

Jakob Foerster

@j_foerst

3 months

I recently had a lunch time conversation with a very senior AI researcher about how are multi-agent problems differ from single agent (their starting point was they do not). One point that made them think: As computers scale, the rest of the world (i.e. no agentic parts) is not

24

17

233

Uljad @ NeurIPS

@uljadb99

3 months

Unlock real diversity in your LLM! 🚀 LLM outputs can be boring and repetitive. Today, we release Intent Factored Generation (IFG) to: - Sample conceptually diverse outputs💡 - Improve performance on math and code reasoning tasks🤔 - Get more engaging conversational agents 🤖

1

11

36

Johan Obando-Ceron 👍🏽

@johanobandoc

3 months

🚨 Excited to share our #ICML2025 paper: "The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep RL" We train RL agents to know when to quit, cutting wasted effort and improving efficiency with our method LEAST. 📄Paper: https://t.co/9ED3FubIPc 🧵Check the thread below👇🏾

Pablo Samuel Castro

@pcastr

3 months

Thrilled to share our #ICML2025 paper “The Courage to Stop: Overcoming Sunk Cost Fallacy in Deep RL”, led by Jiashun Liu and with other great collaborators! We teach RL agents when to quit wasting effort, boosting efficiency with our proposed method LEAST. Here's the story 🧵👇🏾

3

19

126

Jürgen Schmidhuber

@SchmidhuberAI

3 months

Since 1990, we have worked on artificial curiosity & measuring „interestingness.“ Our new ICML paper uses "Prediction of Hidden Units" loss to quantify in-context computational complexity in sequence models. It can tell boring from interesting tasks and predict correct reasoning.

Vincent Herrmann

@idivinci

3 months

Excited to share our new ICML paper, with co-authors @robert_csordas and @SchmidhuberAI! How can we tell if an LLM is actually "thinking" versus just spitting out memorized or trivial text? Can we detect when a model is doing anything interesting? (Thread below👇)

11

64

365

Timon Willi

@TimonWilli

4 months

finally, an Opponent Shaping application I don’t have to make up for the intro section. dreams do come true.

Ola Kalisz

@OlaKalisz8

4 months

Antiviral therapy design is myopic 🦠🙈 optimised only for the current strain. That's why you need a different Flu vaccine every year! Our #ICML2025 paper ADIOS proposes "shaper therapies" that steer viral evolution in our favour & remain effective. Work done @FLAIR_Ox 🧵👇

0

1

10

Ola Kalisz

@OlaKalisz8

4 months

Antiviral therapy design is myopic 🦠🙈 optimised only for the current strain. That's why you need a different Flu vaccine every year! Our #ICML2025 paper ADIOS proposes "shaper therapies" that steer viral evolution in our favour & remain effective. Work done @FLAIR_Ox 🧵👇

2

20

54

Timon Willi

@TimonWilli

7 months

Tried to solve science, but solved humor instead. That’s why greatness cannot be planned.

Sakana AI

@SakanaAILabs

7 months

The AI Scientist is far from perfect. Occasionally it makes embarrassing citation errors. Here, it incorrectly attributed “an LSTM-based neural network” to Goodfellow (2016) rather than to the correct authors, Hochreiter & Schmidhuber (1997). We documented these errors in own

1

0

28

Timon Willi

@TimonWilli

7 months

congrats to the team!

Sakana AI

@SakanaAILabs

7 months

The AI Scientist Generates its First Peer-Reviewed Scientific Publication We’re proud to announce that a paper produced by The AI Scientist-v2 passed the peer-review process at a workshop in ICLR, a top AI conference. Read more about this experiment → https://t.co/LpLYLnZMCQ

0

1

akbir.

@akbirkhan

7 months

In the spirit of making more real world evals, here is the Factorio Learning Environment (FLE). Spurred by wanting to eval if models are good paperclip maximisers, we check how well agents build factories for other things 🏗️🏭🛠️

31

100

1K

Robert Lange

@RobertTLange

1 year

🎉 Stoked to share The AI-Scientist 🧑‍🔬 - our end-to-end approach for conducting research with LLMs including ideation, coding, experiment execution, paper write-up & reviewing. Blog 📰: https://t.co/kBwAgvXDjZ Paper 📜: https://t.co/XvkwWfQhyi Code 💻: https://t.co/hXlXjxFAD9

Sakana AI

@SakanaAILabs

1 year

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! https://t.co/8wVqIXVpZJ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI

14

68

366

Branton DeMoss

@BrantonDeMoss

10 months

I’m pleased to announce our work which studies complexity phase transitions in neural networks! We track the Kolmogorov complexity of networks as they “grok”, and find a characteristic rise and fall of complexity, corresponding to memorization followed by generalization. 🧵

31

156

1K

Foerster Lab for AI Research

@FLAIR_Ox

10 months

🧑‍🔬 FLAIR is presenting three more great papers today at #NeurIPS2024! Come talk to us and find out what we've been doing!

1

5

20

Foerster Lab for AI Research

@FLAIR_Ox

10 months

🔬 FLAIR has a bunch of great papers being presented today at NeurIPS! Come along to learn more about the work! 👀 Keep your eyes peeled for more work being presented over the week!

1

5

13

Timon Willi

@TimonWilli

11 months

There's light at the end of the tunnel of LLM evals: The light at the end of the tunnel:

Davide Paglieri

@PaglieriDavide

11 months

Tired of saturated benchmarks? Want scope for a significant leap in capabilities? 🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games! BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come. 1/🧵

0

11

Davide Paglieri

@PaglieriDavide

11 months

Tired of saturated benchmarks? Want scope for a significant leap in capabilities? 🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games! BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come. 1/🧵

8

43

220

Michael Matthews

@mitrma

11 months

We are very excited to announce Kinetix: an open-ended universe of physics-based tasks for RL! We use Kinetix to train a general agent on millions of randomly generated physics problems and show that this agent generalises to unseen handmade environments. 1/🧵

14

214

1K

Jonny Cook

@JonnyCoook

1 year

What if improving LLM evaluation and generation was as simple as using a checklist? Introducing TICK ✅ (Targeted Instruct-evaluation with ChecKlists) and STICK 🏒 (Self-TICK) Work done @cohere with supervision from @_rockt, @j_foerst, @d_aumiller & @W4ngatang. 1/n

4

12

55