Jon Richens Profile
Jon Richens

@jonathanrichens

Followers
1K
Following
237
Media
12
Statuses
127

Research scientist in AI safety @GoogleDeepMind

Joined August 2020
Don't wanna be here? Send us removal request.
@jonathanrichens
Jon Richens
1 month
Are world models necessary to achieve human-level agents, or is there a model-free short-cut?.Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
Tweet media one
34
177
1K
@jonathanrichens
Jon Richens
26 days
RT @richardcsuwandi: 2 years ago, @ilyasut made a bold prediction that large neural networks are learning world models through text. Recen….
0
117
0
@jonathanrichens
Jon Richens
29 days
RT @alexis_bellot_: Can we trust a black-box system, when all we know is its past behaviour? 🤖🤔.In a new #ICML2025 paper we derive fundamen….
0
26
0
@jonathanrichens
Jon Richens
1 month
… and many more! Check out our paper or come chat to me at #ICML2025. Joint work @GoogleDeepMind with @dabelcs, @alexis_bellot_, @tom4everitt.
5
2
43
@jonathanrichens
Jon Richens
1 month
Causality. In previous work we showed a causal world model is needed for robustness. It turns out you don’t need as much causal knowledge of the environment for task generalization. There is a causal hierarchy, but for agency and agent capabilities, rather than inference!
Tweet media one
3
1
36
@jonathanrichens
Jon Richens
1 month
Emergent capabilities. To minimize training loss across many goals, agents must learn a world model, which can solve tasks the agent was not explicitly trained on. Simple goal-directedness gives rise to many capabilities (social cognition, reasoning about uncertainty, intent…).
Tweet media one
2
1
35
@jonathanrichens
Jon Richens
1 month
Safety. Several approaches to AI safety require accurate world models, but agent capabilities could outpace our ability to build them. Our work gives a theoretical guarantee: we can extract world models from agents, and the model fidelity increases with the agent's capabilities.
2
1
30
@jonathanrichens
Jon Richens
1 month
Extracting world knowledge from agents. We derive algorithms that recover a world model given the agent’s policy and goal (policy + goal -> world model). These algorithms complete the triptych of planning (world model + goal -> policy) and IRL (world model + policy -> goal).
Tweet media one
1
1
35
@jonathanrichens
Jon Richens
1 month
Fundamental limitations on agency. In environments where the dynamics are provably hard to learn, or where long-horizon prediction is infeasible, the capabilities of agents are fundamentally bounded.
1
3
36
@jonathanrichens
Jon Richens
1 month
No model-free path. If you want to train an agent capable of a wide range of goal-directed tasks, you can’t avoid the challenge of learning a world model. And to improve performance or generality, agents need to learn increasingly accurate and detailed world models.
1
4
45
@jonathanrichens
Jon Richens
1 month
These results have several interesting consequences, from emergent capabilities to AI safety… 👇.
1
1
32
@jonathanrichens
Jon Richens
1 month
And to achieve lower regret, or more complex goals, agents must learn increasingly accurate world models. Goal-conditioned policies are informationally equivalent to world models! But only for goals over mutli-step horizons, myopic agents do not need to learn world models.
1
2
44
@jonathanrichens
Jon Richens
1 month
Specifically, we show it’s possible to recover a bounded error approximation of the environment transition function from any goal-conditional policy that satisfies a regret bound across a wide enough set of simple goals, like steering the environment into a desired state.
Tweet media one
1
3
47
@jonathanrichens
Jon Richens
1 month
Turns out there’s a neat answer to this question. We prove that any agent capable of generalizing to a broad range of simple goal-directed tasks must have learned a predictive model capable of simulating its environment. And this model can always be recovered from the agent.
Tweet media one
2
8
64
@jonathanrichens
Jon Richens
1 month
World models are foundational to goal-directedness in humans, but are hard to learn in messy open worlds. We're now seeing generalist, model-free agents (Gato, PaLM-E, Pi-0…). Do these agents learn implicit world models, or have they found another way to generalize to new tasks?.
1
1
40
@jonathanrichens
Jon Richens
3 months
RT @tom4everitt: What if LLMs are sometimes capable of doing a task but don't try hard enough to do it?. In a new paper, we use subtasks to….
0
46
0
@jonathanrichens
Jon Richens
9 months
RT @RichardMCNgo: If I talk to one more person who says “but even if this research direction led to a massive breakthrough in our scientifi….
0
14
0
@jonathanrichens
Jon Richens
1 year
RT @joftius: This year #ICML started a "position paper" track aimed at stimulating discussions. Reader, I chose violence. 𝗧𝗵𝗲 𝗖𝗮𝘂𝘀𝗮𝗹 𝗥𝗲𝘃….
0
6
0
@jonathanrichens
Jon Richens
1 year
RT @logangraham: I’m hiring ambitious Research Scientists at @AnthropicAI to measure and prepare for models acting autonomously in the wor….
0
72
0
@jonathanrichens
Jon Richens
1 year
RT @IasonGabriel: How should we understand A.I. agents?. This blog by @tom4everitt provides one of the clearest and most complete accounts….
0
26
0