Jon Richens @jonathanrichens X Profile

Jon Richens

@jonathanrichens

Followers

1K

Following

237

Media

12

Statuses

127

Research scientist in AI safety @GoogleDeepMind

Joined August 2020

Don't wanna be here? Send us removal request.

Jon Richens

@jonathanrichens

1 month

Are world models necessary to achieve human-level agents, or is there a model-free short-cut?.Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵

34

177

1K

Jon Richens

@jonathanrichens

26 days

RT @richardcsuwandi: 2 years ago, @ilyasut made a bold prediction that large neural networks are learning world models through text. Recen….

0

117

0

Jon Richens

@jonathanrichens

29 days

RT @alexis_bellot_: Can we trust a black-box system, when all we know is its past behaviour? 🤖🤔.In a new #ICML2025 paper we derive fundamen….

0

26

0

Jon Richens

@jonathanrichens

1 month

… and many more! Check out our paper or come chat to me at #ICML2025. Joint work @GoogleDeepMind with @dabelcs, @alexis_bellot_, @tom4everitt.

5

2

43

Jon Richens

@jonathanrichens

1 month

Causality. In previous work we showed a causal world model is needed for robustness. It turns out you don’t need as much causal knowledge of the environment for task generalization. There is a causal hierarchy, but for agency and agent capabilities, rather than inference!

3

1

36

Jon Richens

@jonathanrichens

1 month

Emergent capabilities. To minimize training loss across many goals, agents must learn a world model, which can solve tasks the agent was not explicitly trained on. Simple goal-directedness gives rise to many capabilities (social cognition, reasoning about uncertainty, intent…).

2

1

35

Jon Richens

@jonathanrichens

1 month

Safety. Several approaches to AI safety require accurate world models, but agent capabilities could outpace our ability to build them. Our work gives a theoretical guarantee: we can extract world models from agents, and the model fidelity increases with the agent's capabilities.

2

1

30

Jon Richens

@jonathanrichens

1 month

Extracting world knowledge from agents. We derive algorithms that recover a world model given the agent’s policy and goal (policy + goal -> world model). These algorithms complete the triptych of planning (world model + goal -> policy) and IRL (world model + policy -> goal).

1

35

Jon Richens

@jonathanrichens

1 month

Fundamental limitations on agency. In environments where the dynamics are provably hard to learn, or where long-horizon prediction is infeasible, the capabilities of agents are fundamentally bounded.

1

3

36

Jon Richens

@jonathanrichens

1 month

No model-free path. If you want to train an agent capable of a wide range of goal-directed tasks, you can’t avoid the challenge of learning a world model. And to improve performance or generality, agents need to learn increasingly accurate and detailed world models.

1

4

45

Jon Richens

@jonathanrichens

1 month

These results have several interesting consequences, from emergent capabilities to AI safety… 👇.

1

32

Jon Richens

@jonathanrichens

1 month

And to achieve lower regret, or more complex goals, agents must learn increasingly accurate world models. Goal-conditioned policies are informationally equivalent to world models! But only for goals over mutli-step horizons, myopic agents do not need to learn world models.

1

2

44

Jon Richens

@jonathanrichens

1 month

Specifically, we show it’s possible to recover a bounded error approximation of the environment transition function from any goal-conditional policy that satisfies a regret bound across a wide enough set of simple goals, like steering the environment into a desired state.

1

3

47

Jon Richens

@jonathanrichens

1 month

Turns out there’s a neat answer to this question. We prove that any agent capable of generalizing to a broad range of simple goal-directed tasks must have learned a predictive model capable of simulating its environment. And this model can always be recovered from the agent.

2

8

64

Jon Richens

@jonathanrichens

1 month

World models are foundational to goal-directedness in humans, but are hard to learn in messy open worlds. We're now seeing generalist, model-free agents (Gato, PaLM-E, Pi-0…). Do these agents learn implicit world models, or have they found another way to generalize to new tasks?.

1

40

Jon Richens

@jonathanrichens

3 months

RT @tom4everitt: What if LLMs are sometimes capable of doing a task but don't try hard enough to do it?. In a new paper, we use subtasks to….

0

46

0

Jon Richens

@jonathanrichens

9 months

RT @RichardMCNgo: If I talk to one more person who says “but even if this research direction led to a massive breakthrough in our scientifi….

0

14

0

Jon Richens

@jonathanrichens

1 year

RT @joftius: This year #ICML started a "position paper" track aimed at stimulating discussions. Reader, I chose violence. 𝗧𝗵𝗲 𝗖𝗮𝘂𝘀𝗮𝗹 𝗥𝗲𝘃….

0

6

0

Jon Richens

@jonathanrichens

1 year

RT @logangraham: I’m hiring ambitious Research Scientists at @AnthropicAI to measure and prepare for models acting autonomously in the wor….

0

72

0

Jon Richens

@jonathanrichens

1 year

RT @IasonGabriel: How should we understand A.I. agents?. This blog by @tom4everitt provides one of the clearest and most complete accounts….

0

26

0