Mikayel Samvelyan Profile
Mikayel Samvelyan

@_samvelyan

Followers
2K
Following
4K
Media
89
Statuses
790

Research Scientist @GoogleDeepMind. Previously @Meta (FAIR), PhD @UCL, MSc @UniofOxford. @ELLISforEurope member.

London, England
Joined January 2018
Don't wanna be here? Send us removal request.
@_samvelyan
Mikayel Samvelyan
10 months
Excited to give an invited talk on Agent Learning in Open-Endedness at the @IMOLNeurIPS2024 workshop this Sunday. I'll be joining an amazing lineup of speakers. Hope to see you there! 📅 Sunday, Dec 15 🕚 11:35 - 12:15 📍 West Meeting Room 217-219
@IMOLNeurIPS2024
IMOL Workshop | NeurIPS 2024
10 months
We're at #NeurIPS🇨🇦! Check out our updated Sunday schedule: https://t.co/rE62BRapOo
0
13
48
@jparkerholder
Jack Parker-Holder
9 days
Super cool to see Genie 3 recognized as one of @TIME's Best Inventions of 2025!! Congrats to the incredible team for making it possible :)
@GoogleDeepMind
Google DeepMind
9 days
We’re proud to announce that Genie 3 has been named one of @TIME’s Best Inventions of 2025. Genie 3 is our groundbreaking world model capable of generating interactive, playable environments from text or image prompts. Find out more → https://t.co/bv1gZaWYtd
3
9
155
@ilijabogunovic
Ilija Bogunovic
1 month
We’re hiring PhD students🎓✨ Work with @AurelienLucchi & me on the foundations of reasoning in LLMs — from algorithms & ARC challenge to RL and fine-tuning! 👇
@AurelienLucchi
Aurelien Lucchi
1 month
@ilijabogunovic and I are looking for two PhD candidates in the field of reasoning in machine learning. Apply here:
0
4
18
@PaglieriDavide
Davide Paglieri
1 month
"Always reasoning" (ReAct) isn't optimal for LLM agents! 🧠 Our new paper identifies a "Goldilocks" effect: planning too frequently or not enough degrades performance. We show how to train agents to learn to dynamically allocate test-time compute when needed for best results. 👇
@CupiaBart
Bartłomiej Cupiał
1 month
Almost all agentic pipelines prompt LLMs to explicitly plan before every action (ReAct), but turns out this isn't optimal for Multi-Step RL 🤔 Why? In our new work we highlight a crucial issue with ReAct and show that we should make and follow plans instead🧵
2
20
92
@Smearle_RH
smearle
2 months
We introduce PuzzleJAX, a benchmark for reasoning and learning. 🧩💡🦎 PuzzleJAX compiles hundreds of existing grid-based PuzzleScript games to hardware-accelerated JAX environments, and allows researchers to define new tasks via PuzzleScript's concise rewrite rule-based DSL.
5
37
166
@_rockt
Tim Rocktäschel
2 months
Harder, Better, Faster, Stronger, Real-time! We are excited to reveal Genie 3, our most capable real-time foundational world model. Fantastic cross-team effort led by @jparkerholder and @shlomifruchter. Below some interactive worlds and capabilities that were highlights for me
54
190
1K
@jparkerholder
Jack Parker-Holder
2 months
Genie 3 feels like a watershed moment for world models 🌐: we can now generate multi-minute, real-time interactive simulations of any imaginable world. This could be the key missing piece for embodied AGI… and it can also create beautiful beaches with my dog, playable real time
268
544
5K
@_samvelyan
Mikayel Samvelyan
3 months
An exceptional opportunity with brilliant @robertarail and an amazing team at @GoogleDeepMind! 🚀 If pushing the frontiers of open-ended discovery excites you, this is the place to be. 🔥
@robertarail
Roberta Raileanu
3 months
I’m building a new team at @GoogleDeepMind to work on Open-Ended Discovery! We’re looking for strong Research Scientists and Research Engineers to help us push the frontier of autonomously discovering novel artifacts such as new knowledge, capabilities, or algorithms, in an
1
0
36
@PaglieriDavide
Davide Paglieri
3 months
LLMs acing math olympiads? Cute. But BALROG is where agents fight dragons (and actual Balrogs)🐉😈 And today, Grok-4 (@grok) takes the gold 🥇 Welcome to the podium, champion!
280
678
3K
@rlvg2025
Reinforcement Learning & Video Games Workshop @RLC
3 months
We’re excited to announce our next speaker: Roberta Raileanu (@robertarail) from @GoogleDeepMind! Roberta will discuss NetHack: A Grand Challenge for RL and LLM Agents Alike. ⚔️ Join us on August 5th to learn how to develop agents capable of tackling open-ended environments!
3
9
106
@_samvelyan
Mikayel Samvelyan
4 months
Much-needed multi-agent benchmark for LLMs 👥 Theory of Mind is key as LLMs act in agentic, interactive settings — yet remains underexplored and hard to measure. 💽 Decrypto offers an ToM-based evaluation of reasoning for agents operating in complex social settings. Great work!
@_andreilupu
Andrei Lupu
4 months
Theory of Mind (ToM) is crucial for next gen LLM Agents, yet current benchmarks suffer from multiple shortcomings. Enter 💽 Decrypto, an interactive benchmark for multi-agent reasoning and ToM in LLMs! Work done with @TimonWilli & @j_foerst at @AIatMeta & @FLAIR_Ox 🧵👇
0
3
22
@LauraRuis
Laura Ruis
4 months
LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.
4
57
315
@_rockt
Tim Rocktäschel
4 months
Happy "@NetHack_LE is still completely unsolved" day for those of you who are celebrating it. We released The NetHack Learning Environment ( https://t.co/X0B9M5UDNg)  on this day five years ago. Current frontier models achieve only ~1.7% progression (see https://t.co/Sg6RYKspbE).
3
28
137
@_samvelyan
Mikayel Samvelyan
4 months
Check out Alex’s amazing internship project using Quality-Diversity algorithms to create synthetic reasoning problems! 👇 💡Key takeaway: better data quality improves in-distribution results, while more diversity enhances out-of-distribution generalization.
@Dahoas1
Alex Havrilla
4 months
Excited to announce the final paper of my PhD!📢 A crucial piece of SFT/RL training is the availability of high-quality problem-solution data (Q, A). But what to do for difficult tasks where such data is scarce/hard to generate with SOTA models? Read on to find out
0
7
29
@naitherr
Nathan Herr
4 months
Excited to introduce LLM-First Search (LFS) - a new paradigm where the language model takes the lead in reasoning and search! LFS is a self-directed search method that empowers LLMs to guide the exploration process themselves, without relying on predefined heuristics or fixed
2
25
143
@cong_ml
Cong Lu
4 months
🚀Introducing “StochasTok: Improving Fine-Grained Subword Understanding in LLMs”!🚀 LLMs are incredible but still struggle disproportionately with subword tasks, e.g., for character counts, wordplay, multi-digit numbers, fixing typos… Enter StochasTok, led by @anyaasims! [1/]
1
26
78
@edwardfhughes
Edward Hughes
4 months
What an enormous privilege to give the opening lecture at the OxML summer school this morning. Never have I had such a thought-provoking set of audience questions! Here's to the automation of innovation towards human flourishing alongside the next generation of researchers.
@GlobalGoalsAI
AI for Global Goals
4 months
📣 We’re excited to kick off the course today with a fantastic line-up of speakers: Edward Hughes (Google DeepMind) – AI Squared: Towards AI Capable of AI Research Karo Moilanen (Moonsong Labs)– Agent Guardrails and Proof-of-Agenthood Topologies Peter Gostev(Moonpig) –
1
5
19
@cong_ml
Cong Lu
5 months
Schmidhuber's Gödel Machine: AI "rewriting its code" if provably useful showed the dream of recursive self-improvement 🔄 Thrilled to share our practical realization, inspired by Darwinian evolution! Done with the amazing @jennyzhangzt, @shengranhu, @RobertTLange @jeffclune 😍
@SakanaAILabs
Sakana AI
5 months
Introducing The Darwin Gödel Machine: AI that improves itself by rewriting its own code https://t.co/tBzlhoUMZO The Darwin Gödel Machine (DGM) is a self-improving agent that can modify its own code. Inspired by evolution, we maintain an expanding lineage of agent variants,
5
23
138
@jennyzhangzt
Jenny Zhang
5 months
One promising direction is combining ideas from AlphaEvolve and the Darwin Gödel Machine. Imagine a self-referential system improving itself even at the lowest algorithmic levels at *scale* AlphaEvolve: https://t.co/vwBkEVNZu7 Darwin Gödel Machine:
Tweet card summary image
arxiv.org
Today's AI systems have human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The advance of AI could itself be automated. If done safely, that would...
17
86
570
@_rockt
Tim Rocktäschel
5 months
Proud to announce that Dr @akbirkhan defended his PhD thesis titled "Safe Automated Research" last week 🥳. Massive thanks to @mpshanahan and Pontus Stenetorp for examining! As is customary, Akbir received a personal mortarboard from @UCL_DARK. Details 👇
11
9
151
@edwardfhughes
Edward Hughes
5 months
2025 is the year of open-endedness. Delighted to be giving a talk at RAAIS in a couple of weeks’ time!
@nathanbenaich
Nathan Benaich
5 months
"open-endedness is all we'll need"...this is the study of a system’s ability to continuously generate artifacts that are both novel and learnable to an observer as a route to agi. excited to have @edwardfhughes from @GoogleDeepMind's open-endedness team join us at @raais 2025!
0
8
40