
Mikayel Samvelyan
@_samvelyan
Followers
2K
Following
4K
Media
89
Statuses
790
Research Scientist @GoogleDeepMind. Previously @Meta (FAIR), PhD @UCL, MSc @UniofOxford. @ELLISforEurope member.
London, England
Joined January 2018
Excited to give an invited talk on Agent Learning in Open-Endedness at the @IMOLNeurIPS2024 workshop this Sunday. I'll be joining an amazing lineup of speakers. Hope to see you there! 📅 Sunday, Dec 15 🕚 11:35 - 12:15 📍 West Meeting Room 217-219
0
13
48
Super cool to see Genie 3 recognized as one of @TIME's Best Inventions of 2025!! Congrats to the incredible team for making it possible :)
We’re proud to announce that Genie 3 has been named one of @TIME’s Best Inventions of 2025. Genie 3 is our groundbreaking world model capable of generating interactive, playable environments from text or image prompts. Find out more → https://t.co/bv1gZaWYtd
3
9
155
We’re hiring PhD students🎓✨ Work with @AurelienLucchi & me on the foundations of reasoning in LLMs — from algorithms & ARC challenge to RL and fine-tuning! 👇
@ilijabogunovic and I are looking for two PhD candidates in the field of reasoning in machine learning. Apply here:
0
4
18
"Always reasoning" (ReAct) isn't optimal for LLM agents! 🧠 Our new paper identifies a "Goldilocks" effect: planning too frequently or not enough degrades performance. We show how to train agents to learn to dynamically allocate test-time compute when needed for best results. 👇
Almost all agentic pipelines prompt LLMs to explicitly plan before every action (ReAct), but turns out this isn't optimal for Multi-Step RL 🤔 Why? In our new work we highlight a crucial issue with ReAct and show that we should make and follow plans instead🧵
2
20
92
We introduce PuzzleJAX, a benchmark for reasoning and learning. 🧩💡🦎 PuzzleJAX compiles hundreds of existing grid-based PuzzleScript games to hardware-accelerated JAX environments, and allows researchers to define new tasks via PuzzleScript's concise rewrite rule-based DSL.
5
37
166
Harder, Better, Faster, Stronger, Real-time! We are excited to reveal Genie 3, our most capable real-time foundational world model. Fantastic cross-team effort led by @jparkerholder and @shlomifruchter. Below some interactive worlds and capabilities that were highlights for me
54
190
1K
Genie 3 feels like a watershed moment for world models 🌐: we can now generate multi-minute, real-time interactive simulations of any imaginable world. This could be the key missing piece for embodied AGI… and it can also create beautiful beaches with my dog, playable real time
268
544
5K
An exceptional opportunity with brilliant @robertarail and an amazing team at @GoogleDeepMind! 🚀 If pushing the frontiers of open-ended discovery excites you, this is the place to be. 🔥
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an
1
0
36
LLMs acing math olympiads? Cute. But BALROG is where agents fight dragons (and actual Balrogs)🐉😈 And today, Grok-4 (@grok) takes the gold 🥇 Welcome to the podium, champion!
280
678
3K
We’re excited to announce our next speaker: Roberta Raileanu (@robertarail) from @GoogleDeepMind! Roberta will discuss NetHack: A Grand Challenge for RL and LLM Agents Alike. ⚔️ Join us on August 5th to learn how to develop agents capable of tackling open-ended environments!
3
9
106
Much-needed multi-agent benchmark for LLMs 👥 Theory of Mind is key as LLMs act in agentic, interactive settings — yet remains underexplored and hard to measure. 💽 Decrypto offers an ToM-based evaluation of reasoning for agents operating in complex social settings. Great work!
Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇
0
3
22
LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.
4
57
315
Happy "@NetHack_LE is still completely unsolved" day for those of you who are celebrating it. We released The NetHack Learning Environment ( https://t.co/X0B9M5UDNg) on this day five years ago. Current frontier models achieve only ~1.7% progression (see https://t.co/Sg6RYKspbE).
3
28
137
Check out Alex’s amazing internship project using Quality-Diversity algorithms to create synthetic reasoning problems! 👇 💡Key takeaway: better data quality improves in-distribution results, while more diversity enhances out-of-distribution generalization.
Excited to announce the final paper of my PhD!📢 A crucial piece of SFT/RL training is the availability of high-quality problem-solution data (Q, A). But what to do for difficult tasks where such data is scarce/hard to generate with SOTA models? Read on to find out
0
7
29
Excited to introduce LLM-First Search (LFS) - a new paradigm where the language model takes the lead in reasoning and search! LFS is a self-directed search method that empowers LLMs to guide the exploration process themselves, without relying on predefined heuristics or fixed
2
25
143
🚀Introducing “StochasTok: Improving Fine-Grained Subword Understanding in LLMs”!🚀 LLMs are incredible but still struggle disproportionately with subword tasks, e.g., for character counts, wordplay, multi-digit numbers, fixing typos… Enter StochasTok, led by @anyaasims! [1/]
1
26
78
What an enormous privilege to give the opening lecture at the OxML summer school this morning. Never have I had such a thought-provoking set of audience questions! Here's to the automation of innovation towards human flourishing alongside the next generation of researchers.
📣 We’re excited to kick off the course today with a fantastic line-up of speakers: Edward Hughes (Google DeepMind) – AI Squared: Towards AI Capable of AI Research Karo Moilanen (Moonsong Labs)– Agent Guardrails and Proof-of-Agenthood Topologies Peter Gostev(Moonpig) –
1
5
19
Schmidhuber's Gödel Machine: AI "rewriting its code" if provably useful showed the dream of recursive self-improvement 🔄 Thrilled to share our practical realization, inspired by Darwinian evolution! Done with the amazing @jennyzhangzt, @shengranhu, @RobertTLange @jeffclune 😍
Introducing The Darwin Gödel Machine: AI that improves itself by rewriting its own code https://t.co/tBzlhoUMZO The Darwin Gödel Machine (DGM) is a self-improving agent that can modify its own code. Inspired by evolution, we maintain an expanding lineage of agent variants,
5
23
138
One promising direction is combining ideas from AlphaEvolve and the Darwin Gödel Machine. Imagine a self-referential system improving itself even at the lowest algorithmic levels at *scale* AlphaEvolve: https://t.co/vwBkEVNZu7 Darwin Gödel Machine:
arxiv.org
Today's AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would...
17
86
570
Proud to announce that Dr @akbirkhan defended his PhD thesis titled "Safe Automated Research" last week 🥳. Massive thanks to @mpshanahan and Pontus Stenetorp for examining! As is customary, Akbir received a personal mortarboard from @UCL_DARK. Details 👇
11
9
151
2025 is the year of open-endedness. Delighted to be giving a talk at RAAIS in a couple of weeks’ time!
"open-endedness is all we'll need"...this is the study of a system’s ability to continuously generate artifacts that are both novel and learnable to an observer as a route to agi. excited to have @edwardfhughes from @GoogleDeepMind's open-endedness team join us at @raais 2025!
0
8
40