Danny Sawyer @dannypsawyer X Profile

Danny Sawyer

@dannypsawyer

Followers

126

Following

57

Media

11

Statuses

59

AI researcher @GoogleDeepMind. PhD @Caltech. Interested in autonomous exploration and self-improvement, both in humans and embodied AI agents. Views my own.

Bay Area, CA

Joined June 2010

Don't wanna be here? Send us removal request.

Danny Sawyer

@dannypsawyer

27 days

Thanks to all the authors! @dannypsawyer @rosemary_ke @martinengelcke @__Reidy__ @AlexLerchner @DaniloJRezende @countzerozzz @mc_mozer @janexwang 13/13

0

6

Danny Sawyer

@dannypsawyer

27 days

In summary, our work provides a deeper understanding of the exploration and adaptation capabilities of frontier models. We show that these skills, while not yet robust, can be elicited. Read the full paper for all the details! https://t.co/8Q9j1VMTYv #NeurIPS2025 12/13

1

0

5

Danny Sawyer

@dannypsawyer

27 days

This reveals that a major frontier for foundation agents isn't just acting, but reflecting. The ability to improve through adaptive strategies over time is challenging, but not fundamentally out of reach. Benchmarks like Alchemy are crucial for measuring this progress. 11/13

1

0

4

Danny Sawyer

@dannypsawyer

27 days

We took it a step further: strategy adaptation. We silently changed the environment's rules mid-episode. We found some models, like Gemini 2.5 and Claude 3.7, when aided by summarization, could detect the change and successfully adapt their strategy, recovering performance 10/13

2

0

4

Danny Sawyer

@dannypsawyer

27 days

With the summarization prompt, a latent meta-learning ability emerged. Models now showed significant score improvement across trials. The act of summarizing forced them to consolidate their knowledge, enabling them to form and execute better strategies in later trials. 8/13

1

0

4

Danny Sawyer

@dannypsawyer

27 days

This led to our key insight. We hypothesized the models weren't actively distilling principles from their long action history. So, we prompted them to write a summary of their findings after each trial. The effect was dramatic. 8/13

1

0

4

Danny Sawyer

@dannypsawyer

27 days

But in the complex Alchemy environment, performance faltered. Without guidance, even the most powerful models showed no significant improvement across trials. They gathered data but failed to integrate it into a better strategy. Meta-learning did not occur naturally. 7/13

1

0

4

Danny Sawyer

@dannypsawyer

27 days

In the simple Feature World tasks, most models performed near-optimally. They are highly efficient at gathering information when the goal is straightforward. This shows the challenge isn't basic, single-turn reasoning. They can select informative actions in the moment. 6/13

1

0

4

Danny Sawyer

@dannypsawyer

27 days

2️⃣ Alchemy: A multi-trial environment that requires agents to deduce latent causal rules and improve their strategy over time. The rules are random, but stay the same across trials. This isolates different facets of exploration from Feature World. 5/13

1

0

5

Danny Sawyer

@dannypsawyer

27 days

We evaluated models in two environments: 1️⃣ Feature World (both text-based and 3D in Construction Lab): A stateless setting to test raw information-gathering efficiency. 4/13

1

0

4

Danny Sawyer

@dannypsawyer

27 days

These patterns of failures offer interesting insights into how foundation models function, and also point toward ways to unlock these core embodied exploration abilities. 3/13

1

0

5

Danny Sawyer

@dannypsawyer

27 days

We benchmarked variants of GPT, Claude, and Gemini on exploration in several embodied environments. Surprisingly, although most models did well on stateless, single-turn tasks, many had critical limitations in adaptation and meta-learning in stateful, multi-turn tasks. 2/13

2

0

5

Danny Sawyer

@dannypsawyer

27 days

Happy to announce that our work has been accepted to workshops on Multi-turn Interactions and Embodied World Models at #NeurIPS2025! Frontier foundation models are incredible, but how well can they explore in interactive environments? Paper👇 https://t.co/8Q9j1VMTYv 🧵1/13

1

5

23

Danny Sawyer

@dannypsawyer

2 years

Excited to say the project I've been a part of for the past year at Google DeepMind is now public.

Demis Hassabis

@demishassabis

2 years

Excited to announce SIMA, a general AI agent for games & 3D virtual settings. It marks the first time an agent has demonstrated it can follow natural-language instructions to carry out a wide range of tasks across a large array of game worlds, similar to how a human would play.

0

1

4

Mikhail Shapiro (same on bsky) 🇺🇦

@mikhailshapiro

4 years

Can ultrasound detect gene expression in single cells? Yes, with a new ultrasensitive imaging method called BURST and acoustic reporter genes based on #gasvesicles. Congrats to @dannypsawyer & team, who describe this approach in today's @naturemethods. https://t.co/98HAWDQo4M

18

66

363

Mikhail Shapiro (same on bsky) 🇺🇦

@mikhailshapiro

5 years

24 yrs ago, Roger Tsien et al introduced the first fluorescent biosensors based on #GFP. Today, we introduce the first acoustic biosensors based on #gasvesicles. Now it's possible to image the action of specific molecules (enzymes) in the body w/ultrasound

34

330

1K

Dileep George

@dileeplearning

5 years

Concepts are abstractions that can be learned as programs on a 'visual cognitive computer', and now they can be induced 1000x faster, thanks to object-factorized search and subgoaling. Checkout our #CogSci2020 paper with @dannypsawyer https://t.co/0XL8Y04Xa9

Dileep George

@dileeplearning

7 years

Mini thread about the cognitive science and neuroscience inspirations behind our new paper in which we learn concepts as 'cognitive programs' on a 'visual cognitive computer'. https://t.co/Pdss0RD5te

0

8

25

Sean Carroll

@seanmcarroll

6 years

Shots fired! "Even Physicists Don’t Understand Quantum Mechanics. Worse, they don’t seem to want to understand it." -- me, in the New York Times @nytopinion #SomethingDeeply https://t.co/rsEUO1sSOh

nytimes.com

Worse, they don’t seem to want to understand it.

101

290

916

Danny Sawyer

@dannypsawyer

7 years

After two years of ideas, coding, debugging, experiments, analysis, figure design, writing, re-writing, peer review, and re-re-writing, the 1st paper of my PhD was published today in Physical Review X. https://t.co/XdTVg51IW3

2

0

7

White House Archived

@ObamaWhiteHouse

11 years

"Last month, we launched a new spacecraft as part of a re-energized space program that will send American astronauts to Mars" —Obama #SOTU

17

379

336