dannypsawyer Profile Banner
Danny Sawyer Profile
Danny Sawyer

@dannypsawyer

Followers
126
Following
57
Media
11
Statuses
59

AI researcher @GoogleDeepMind. PhD @Caltech. Interested in autonomous exploration and self-improvement, both in humans and embodied AI agents. Views my own.

Bay Area, CA
Joined June 2010
Don't wanna be here? Send us removal request.
@dannypsawyer
Danny Sawyer
27 days
In summary, our work provides a deeper understanding of the exploration and adaptation capabilities of frontier models. We show that these skills, while not yet robust, can be elicited. Read the full paper for all the details! https://t.co/8Q9j1VMTYv #NeurIPS2025 12/13
1
0
5
@dannypsawyer
Danny Sawyer
27 days
This reveals that a major frontier for foundation agents isn't just acting, but reflecting. The ability to improve through adaptive strategies over time is challenging, but not fundamentally out of reach. Benchmarks like Alchemy are crucial for measuring this progress. 11/13
1
0
4
@dannypsawyer
Danny Sawyer
27 days
We took it a step further: strategy adaptation. We silently changed the environment's rules mid-episode. We found some models, like Gemini 2.5 and Claude 3.7, when aided by summarization, could detect the change and successfully adapt their strategy, recovering performance 10/13
2
0
4
@dannypsawyer
Danny Sawyer
27 days
With the summarization prompt, a latent meta-learning ability emerged. Models now showed significant score improvement across trials. The act of summarizing forced them to consolidate their knowledge, enabling them to form and execute better strategies in later trials. 8/13
1
0
4
@dannypsawyer
Danny Sawyer
27 days
This led to our key insight. We hypothesized the models weren't actively distilling principles from their long action history. So, we prompted them to write a summary of their findings after each trial. The effect was dramatic.  8/13
1
0
4
@dannypsawyer
Danny Sawyer
27 days
But in the complex Alchemy environment, performance faltered. Without guidance, even the most powerful models showed no significant improvement across trials. They gathered data but failed to integrate it into a better strategy. Meta-learning did not occur naturally. 7/13
1
0
4
@dannypsawyer
Danny Sawyer
27 days
In the simple Feature World tasks, most models performed near-optimally. They are highly efficient at gathering information when the goal is straightforward. This shows the challenge isn't basic, single-turn reasoning. They can select informative actions in the moment. 6/13
1
0
4
@dannypsawyer
Danny Sawyer
27 days
2️⃣ Alchemy: A multi-trial environment that requires agents to deduce latent causal rules and improve their strategy over time. The rules are random, but stay the same across trials. This isolates different facets of exploration from Feature World. 5/13
1
0
5
@dannypsawyer
Danny Sawyer
27 days
We evaluated models in two environments: 1️⃣ Feature World (both text-based and 3D in Construction Lab): A stateless setting to test raw information-gathering efficiency. 4/13
1
0
4
@dannypsawyer
Danny Sawyer
27 days
These patterns of failures offer interesting insights into how foundation models function, and also point toward ways to unlock these core embodied exploration abilities. 3/13
1
0
5
@dannypsawyer
Danny Sawyer
27 days
We benchmarked variants of GPT, Claude, and Gemini on exploration in several embodied environments. Surprisingly, although most models did well on stateless, single-turn tasks, many had critical limitations in adaptation and meta-learning in stateful, multi-turn tasks. 2/13
2
0
5
@dannypsawyer
Danny Sawyer
27 days
Happy to announce that our work has been accepted to workshops on Multi-turn Interactions and Embodied World Models at #NeurIPS2025! Frontier foundation models are incredible, but how well can they explore in interactive environments? Paper👇 https://t.co/8Q9j1VMTYv 🧵1/13
1
5
23
@dannypsawyer
Danny Sawyer
2 years
Excited to say the project I've been a part of for the past year at Google DeepMind is now public.
@demishassabis
Demis Hassabis
2 years
Excited to announce SIMA, a general AI agent for games & 3D virtual settings. It marks the first time an agent has demonstrated it can follow natural-language instructions to carry out a wide range of tasks across a large array of game worlds, similar to how a human would play.
0
1
4
@mikhailshapiro
Mikhail Shapiro (same on bsky) 🇺🇦
4 years
Can ultrasound detect gene expression in single cells? Yes, with a new ultrasensitive imaging method called BURST and acoustic reporter genes based on #gasvesicles. Congrats to @dannypsawyer & team, who describe this approach in today's @naturemethods. https://t.co/98HAWDQo4M
18
66
363
@mikhailshapiro
Mikhail Shapiro (same on bsky) 🇺🇦
5 years
24 yrs ago, Roger Tsien et al introduced the first fluorescent biosensors based on #GFP. Today, we introduce the first acoustic biosensors based on #gasvesicles. Now it's possible to image the action of specific molecules (enzymes) in the body w/ultrasound
34
330
1K
@dileeplearning
Dileep George
5 years
Concepts are abstractions that can be learned as programs on a 'visual cognitive computer', and now they can be induced 1000x faster, thanks to object-factorized search and subgoaling. Checkout our #CogSci2020 paper with @dannypsawyer https://t.co/0XL8Y04Xa9
@dileeplearning
Dileep George
7 years
Mini thread about the cognitive science and neuroscience inspirations behind our new paper in which we learn concepts as 'cognitive programs' on a 'visual cognitive computer'. https://t.co/Pdss0RD5te
0
8
25
@seanmcarroll
Sean Carroll
6 years
Shots fired! "Even Physicists Don’t Understand Quantum Mechanics. Worse, they don’t seem to want to understand it." -- me, in the New York Times @nytopinion #SomethingDeeply https://t.co/rsEUO1sSOh
Tweet card summary image
nytimes.com
Worse, they don’t seem to want to understand it.
101
290
916
@dannypsawyer
Danny Sawyer
7 years
After two years of ideas, coding, debugging, experiments, analysis, figure design, writing, re-writing, peer review, and re-re-writing, the 1st paper of my PhD was published today in Physical Review X. https://t.co/XdTVg51IW3
2
0
7
@ObamaWhiteHouse
White House Archived
11 years
"Last month, we launched a new spacecraft as part of a re-energized space program that will send American astronauts to Mars" —Obama #SOTU
17
379
336