Danny Sawyer
@dannypsawyer
Followers
126
Following
57
Media
11
Statuses
59
AI researcher @GoogleDeepMind. PhD @Caltech. Interested in autonomous exploration and self-improvement, both in humans and embodied AI agents. Views my own.
Bay Area, CA
Joined June 2010
Thanks to all the authors! @dannypsawyer @rosemary_ke @martinengelcke @__Reidy__ @AlexLerchner @DaniloJRezende @countzerozzz @mc_mozer @janexwang 13/13
0
0
6
In summary, our work provides a deeper understanding of the exploration and adaptation capabilities of frontier models. We show that these skills, while not yet robust, can be elicited. Read the full paper for all the details! https://t.co/8Q9j1VMTYv
#NeurIPS2025 12/13
1
0
5
This reveals that a major frontier for foundation agents isn't just acting, but reflecting. The ability to improve through adaptive strategies over time is challenging, but not fundamentally out of reach. Benchmarks like Alchemy are crucial for measuring this progress. 11/13
1
0
4
We took it a step further: strategy adaptation. We silently changed the environment's rules mid-episode. We found some models, like Gemini 2.5 and Claude 3.7, when aided by summarization, could detect the change and successfully adapt their strategy, recovering performance 10/13
2
0
4
With the summarization prompt, a latent meta-learning ability emerged. Models now showed significant score improvement across trials. The act of summarizing forced them to consolidate their knowledge, enabling them to form and execute better strategies in later trials. 8/13
1
0
4
This led to our key insight. We hypothesized the models weren't actively distilling principles from their long action history. So, we prompted them to write a summary of their findings after each trial. The effect was dramatic. 8/13
1
0
4
But in the complex Alchemy environment, performance faltered. Without guidance, even the most powerful models showed no significant improvement across trials. They gathered data but failed to integrate it into a better strategy. Meta-learning did not occur naturally. 7/13
1
0
4
In the simple Feature World tasks, most models performed near-optimally. They are highly efficient at gathering information when the goal is straightforward. This shows the challenge isn't basic, single-turn reasoning. They can select informative actions in the moment. 6/13
1
0
4
2️⃣ Alchemy: A multi-trial environment that requires agents to deduce latent causal rules and improve their strategy over time. The rules are random, but stay the same across trials. This isolates different facets of exploration from Feature World. 5/13
1
0
5
We evaluated models in two environments: 1️⃣ Feature World (both text-based and 3D in Construction Lab): A stateless setting to test raw information-gathering efficiency. 4/13
1
0
4
These patterns of failures offer interesting insights into how foundation models function, and also point toward ways to unlock these core embodied exploration abilities. 3/13
1
0
5
We benchmarked variants of GPT, Claude, and Gemini on exploration in several embodied environments. Surprisingly, although most models did well on stateless, single-turn tasks, many had critical limitations in adaptation and meta-learning in stateful, multi-turn tasks. 2/13
2
0
5
Happy to announce that our work has been accepted to workshops on Multi-turn Interactions and Embodied World Models at #NeurIPS2025! Frontier foundation models are incredible, but how well can they explore in interactive environments? Paper👇 https://t.co/8Q9j1VMTYv 🧵1/13
1
5
23
Excited to say the project I've been a part of for the past year at Google DeepMind is now public.
Excited to announce SIMA, a general AI agent for games & 3D virtual settings. It marks the first time an agent has demonstrated it can follow natural-language instructions to carry out a wide range of tasks across a large array of game worlds, similar to how a human would play.
0
1
4
Can ultrasound detect gene expression in single cells? Yes, with a new ultrasensitive imaging method called BURST and acoustic reporter genes based on #gasvesicles. Congrats to @dannypsawyer & team, who describe this approach in today's @naturemethods. https://t.co/98HAWDQo4M
18
66
363
24 yrs ago, Roger Tsien et al introduced the first fluorescent biosensors based on #GFP. Today, we introduce the first acoustic biosensors based on #gasvesicles. Now it's possible to image the action of specific molecules (enzymes) in the body w/ultrasound
34
330
1K
Concepts are abstractions that can be learned as programs on a 'visual cognitive computer', and now they can be induced 1000x faster, thanks to object-factorized search and subgoaling. Checkout our #CogSci2020 paper with @dannypsawyer
https://t.co/0XL8Y04Xa9
Mini thread about the cognitive science and neuroscience inspirations behind our new paper in which we learn concepts as 'cognitive programs' on a 'visual cognitive computer'. https://t.co/Pdss0RD5te
0
8
25
Shots fired! "Even Physicists Don’t Understand Quantum Mechanics. Worse, they don’t seem to want to understand it." -- me, in the New York Times @nytopinion #SomethingDeeply
https://t.co/rsEUO1sSOh
nytimes.com
Worse, they don’t seem to want to understand it.
101
290
916
After two years of ideas, coding, debugging, experiments, analysis, figure design, writing, re-writing, peer review, and re-re-writing, the 1st paper of my PhD was published today in Physical Review X. https://t.co/XdTVg51IW3
2
0
7
"Last month, we launched a new spacecraft as part of a re-energized space program that will send American astronauts to Mars" —Obama #SOTU
17
379
336