Mikayel Samvelyan @_samvelyan X Profile

Mikayel Samvelyan

@_samvelyan

Followers

2K

Following

4K

Media

89

Statuses

790

Research Scientist @GoogleDeepMind. Previously @Meta (FAIR), PhD @UCL, MSc @UniofOxford. @ELLISforEurope member.

https://t.co/8jb11EfRcp

London, England

Joined January 2018

Don't wanna be here? Send us removal request.

Mikayel Samvelyan

@_samvelyan

10 months

Excited to give an invited talk on Agent Learning in Open-Endedness at the @IMOLNeurIPS2024 workshop this Sunday. I'll be joining an amazing lineup of speakers. Hope to see you there! 📅 Sunday, Dec 15 🕚 11:35 - 12:15 📍 West Meeting Room 217-219

IMOL Workshop | NeurIPS 2024

@IMOLNeurIPS2024

10 months

We're at #NeurIPS🇨🇦! Check out our updated Sunday schedule: https://t.co/rE62BRapOo

0

13

48

Jack Parker-Holder

@jparkerholder

9 days

Super cool to see Genie 3 recognized as one of @TIME's Best Inventions of 2025!! Congrats to the incredible team for making it possible :)

Google DeepMind

@GoogleDeepMind

9 days

We’re proud to announce that Genie 3 has been named one of @TIME’s Best Inventions of 2025. Genie 3 is our groundbreaking world model capable of generating interactive, playable environments from text or image prompts. Find out more → https://t.co/bv1gZaWYtd

3

9

155

Ilija Bogunovic

@ilijabogunovic

1 month

We’re hiring PhD students🎓✨ Work with @AurelienLucchi & me on the foundations of reasoning in LLMs — from algorithms & ARC challenge to RL and fine-tuning! 👇

Aurelien Lucchi

@AurelienLucchi

1 month

@ilijabogunovic and I are looking for two PhD candidates in the field of reasoning in machine learning. Apply here:

0

4

18

Davide Paglieri

@PaglieriDavide

1 month

"Always reasoning" (ReAct) isn't optimal for LLM agents! 🧠 Our new paper identifies a "Goldilocks" effect: planning too frequently or not enough degrades performance. We show how to train agents to learn to dynamically allocate test-time compute when needed for best results. 👇

Bartłomiej Cupiał

@CupiaBart

1 month

Almost all agentic pipelines prompt LLMs to explicitly plan before every action (ReAct), but turns out this isn't optimal for Multi-Step RL 🤔 Why? In our new work we highlight a crucial issue with ReAct and show that we should make and follow plans instead🧵

2

20

92

smearle

@Smearle_RH

2 months

We introduce PuzzleJAX, a benchmark for reasoning and learning. 🧩💡🦎 PuzzleJAX compiles hundreds of existing grid-based PuzzleScript games to hardware-accelerated JAX environments, and allows researchers to define new tasks via PuzzleScript's concise rewrite rule-based DSL.

5

37

166

Tim Rocktäschel

@_rockt

2 months

Harder, Better, Faster, Stronger, Real-time! We are excited to reveal Genie 3, our most capable real-time foundational world model. Fantastic cross-team effort led by @jparkerholder and @shlomifruchter. Below some interactive worlds and capabilities that were highlights for me

54

190

1K

Jack Parker-Holder

@jparkerholder

2 months

Genie 3 feels like a watershed moment for world models 🌐: we can now generate multi-minute, real-time interactive simulations of any imaginable world. This could be the key missing piece for embodied AGI… and it can also create beautiful beaches with my dog, playable real time

268

544

5K

Mikayel Samvelyan

@_samvelyan

3 months

An exceptional opportunity with brilliant @robertarail and an amazing team at @GoogleDeepMind! 🚀 If pushing the frontiers of open-ended discovery excites you, this is the place to be. 🔥

Roberta Raileanu

@robertarail

3 months

I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an

1

0

36

Davide Paglieri

@PaglieriDavide

3 months

LLMs acing math olympiads? Cute. But BALROG is where agents fight dragons (and actual Balrogs)🐉😈 And today, Grok-4 (@grok) takes the gold 🥇 Welcome to the podium, champion!

280

678

3K

Reinforcement Learning & Video Games Workshop @RLC

@rlvg2025

3 months

We’re excited to announce our next speaker: Roberta Raileanu (@robertarail) from @GoogleDeepMind! Roberta will discuss NetHack: A Grand Challenge for RL and LLM Agents Alike. ⚔️ Join us on August 5th to learn how to develop agents capable of tackling open-ended environments!

3

9

106

Mikayel Samvelyan

@_samvelyan

4 months

Much-needed multi-agent benchmark for LLMs 👥 Theory of Mind is key as LLMs act in agentic, interactive settings — yet remains underexplored and hard to measure. 💽 Decrypto offers an ToM-based evaluation of reasoning for agents operating in complex social settings. Great work!

Andrei Lupu

@_andreilupu

4 months

Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇

0

3

22

Laura Ruis

@LauraRuis

4 months

LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.

4

57

315

Tim Rocktäschel

@_rockt

4 months

Happy "@NetHack_LE is still completely unsolved" day for those of you who are celebrating it. We released The NetHack Learning Environment ( https://t.co/X0B9M5UDNg) on this day five years ago. Current frontier models achieve only ~1.7% progression (see https://t.co/Sg6RYKspbE).

3

28

137

Mikayel Samvelyan

@_samvelyan

4 months

Check out Alex’s amazing internship project using Quality-Diversity algorithms to create synthetic reasoning problems! 👇 💡Key takeaway: better data quality improves in-distribution results, while more diversity enhances out-of-distribution generalization.

Alex Havrilla

@Dahoas1

4 months

Excited to announce the final paper of my PhD!📢 A crucial piece of SFT/RL training is the availability of high-quality problem-solution data (Q, A). But what to do for difficult tasks where such data is scarce/hard to generate with SOTA models? Read on to find out

0

7

29

Nathan Herr

@naitherr

4 months

Excited to introduce LLM-First Search (LFS) - a new paradigm where the language model takes the lead in reasoning and search! LFS is a self-directed search method that empowers LLMs to guide the exploration process themselves, without relying on predefined heuristics or fixed

2

25

143

Cong Lu

@cong_ml

4 months

🚀Introducing “StochasTok: Improving Fine-Grained Subword Understanding in LLMs”!🚀 LLMs are incredible but still struggle disproportionately with subword tasks, e.g., for character counts, wordplay, multi-digit numbers, fixing typos… Enter StochasTok, led by @anyaasims! [1/]

1

26

78

Edward Hughes

@edwardfhughes

4 months

What an enormous privilege to give the opening lecture at the OxML summer school this morning. Never have I had such a thought-provoking set of audience questions! Here's to the automation of innovation towards human flourishing alongside the next generation of researchers.

AI for Global Goals

@GlobalGoalsAI

4 months

📣 We’re excited to kick off the course today with a fantastic line-up of speakers: Edward Hughes (Google DeepMind) – AI Squared: Towards AI Capable of AI Research Karo Moilanen (Moonsong Labs)– Agent Guardrails and Proof-of-Agenthood Topologies Peter Gostev(Moonpig) –

1

5

19

Cong Lu

@cong_ml

5 months

Schmidhuber's Gödel Machine: AI "rewriting its code" if provably useful showed the dream of recursive self-improvement 🔄 Thrilled to share our practical realization, inspired by Darwinian evolution! Done with the amazing @jennyzhangzt, @shengranhu, @RobertTLange @jeffclune 😍

Sakana AI

@SakanaAILabs

5 months

Introducing The Darwin Gödel Machine: AI that improves itself by rewriting its own code https://t.co/tBzlhoUMZO The Darwin Gödel Machine (DGM) is a self-improving agent that can modify its own code. Inspired by evolution, we maintain an expanding lineage of agent variants,

5

23

138

Jenny Zhang

@jennyzhangzt

5 months

One promising direction is combining ideas from AlphaEvolve and the Darwin Gödel Machine. Imagine a self-referential system improving itself even at the lowest algorithmic levels at *scale* AlphaEvolve: https://t.co/vwBkEVNZu7 Darwin Gödel Machine:

arxiv.org

Today's AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would...

17

86

570

Tim Rocktäschel

@_rockt

5 months

Proud to announce that Dr @akbirkhan defended his PhD thesis titled "Safe Automated Research" last week 🥳. Massive thanks to @mpshanahan and Pontus Stenetorp for examining! As is customary, Akbir received a personal mortarboard from @UCL_DARK. Details 👇

11

9

151

Edward Hughes

@edwardfhughes

5 months

2025 is the year of open-endedness. Delighted to be giving a talk at RAAIS in a couple of weeks’ time!

Nathan Benaich

@nathanbenaich

5 months

"open-endedness is all we'll need"...this is the study of a system’s ability to continuously generate artifacts that are both novel and learnable to an observer as a route to agi. excited to have @edwardfhughes from @GoogleDeepMind's open-endedness team join us at @raais 2025!

0

8

40