Tom Everitt @tom4everitt X Profile

Tom Everitt

@tom4everitt

Followers

2K

Following

10K

Media

14

Statuses

220

AGI safety researcher at @GoogleDeepMind, leading https://t.co/gBAjHPCTcL switching to https://t.co/aIojqeDT0I

London

Joined August 2017

Don't wanna be here? Send us removal request.

Tom Everitt

@tom4everitt

14 hours

pretty cool effort to distribute economic power in the age of powerful AI.

Rudolf Laine

@LRudL_

16 hours

With @luke_drago_, I’m cofounding Workshop Labs, a public benefit corporation preventing human disempowerment from AI. See below for:.-impact case.-what we’re building.-what we hope the future looks like.-what we’re hiring for.

0

3

Tom Everitt

@tom4everitt

1 month

RT @MichaelD1729: Someone needs to use this as the basis of an unsupervised environment design algorithm to give AI designers direct contro….

0

2

0

Tom Everitt

@tom4everitt

1 month

Causality is about predicting how interventions affect outcomes. Can we use causality to predict how environment changes affect agent behavior?. We explore this idea in a new paper.

Alexis Bellot

@alexis_bellot_

1 month

Can we trust a black-box system, when all we know is its past behaviour? 🤖🤔.In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. 🧵

2

1

20

Tom Everitt

@tom4everitt

1 month

interestingly, task generality doesn't require a causal model, and so is less demanding than robustness in this sense .

0

1

3

Tom Everitt

@tom4everitt

1 month

new world models paper, this time with task-generality rather than robustness.

Jon Richens

@jonathanrichens

1 month

Are world models necessary to achieve human-level agents, or is there a model-free short-cut?.Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵

1

16

Tom Everitt

@tom4everitt

3 months

Link to paper: Joint work with: @ggarbacea @alexis_bellot_ @jonathanrichens @HenryPapadatos @Simeon_Cps @rohinmshah from @googledeepmind @UChicago and SaferAI.

1

6

Tom Everitt

@tom4everitt

3 months

We hope our measure of goal-directedness will help designers be more deliberate about what agentic properties they instill in their AI systems, to maximize utility while minimizing risks.

0

1

8

Tom Everitt

@tom4everitt

3 months

Our definition also predicts other indicators of goal-directedness, such as propensity to rebuild a fallen tower, and persistence at Progress Ratio Tasks. (In PRTs, you can quit at any time, and face diminishing returns. They are commonly used to assess human goal-directedness.)

2

0

10

Tom Everitt

@tom4everitt

3 months

Notably, goal-directedness is fairly consistent across tasks, indicating that it may be an intrinsic property of models. Newer models are often more goal-directed, perhaps due to better RL post-training.

1

0

9

Tom Everitt

@tom4everitt

3 months

We develop a suite of tasks and subtasks in a blocksworld environment that we open source with the paper The tasks cover information gathering, cognitive effort, and plan execution. We test Gemini, GPT, and Claude versions.

1

0

6

Tom Everitt

@tom4everitt

3 months

Previous work on goal-directedness in AI safety have defined it as propensity to reach a goal. This conflates it with capabilities. Here, we instead measure whether LLMs use their capabilities towards a given goal. Sometimes, more capable models are less goal-directed.

1

0

11

Tom Everitt

@tom4everitt

3 months

Goal-directedness has long been recognized as an important property. At its best, it enables autonomy. At its worst, it promotes unethical means and reward hacking, like sycophancy, preference manipulation, inappropriate relationships, resource acquisition, shutdown resistance.

1

0

12

Tom Everitt

@tom4everitt

3 months

What if LLMs are sometimes capable of doing a task but don't try hard enough to do it?. In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵

22

46

238

Tom Everitt

@tom4everitt

4 months

RT @F_Rhys_Ward: In real-life, agents with different subjective beliefs interact in a shared objective reality. They have higher-order beli….

0

12

0

Tom Everitt

@tom4everitt

4 months

One thing that I really like about this is that my content is much less determined by who I follow, than by which posts I like. This means I can express my approval for a post, without worrying that similar content will now flood my feed.

Tom Everitt

@tom4everitt

4 months

Instead, there's a market place of content selection algorithms. My favourites are.* "Following": simple chronological feed (default).* "Quiet posters": posts from less frequent posters in your feed.* "Paper Skygest": posts about papers.

0

1

Tom Everitt

@tom4everitt

4 months

It's growing fast, so chances are that some of your favorite posters and followers are already there. By crossposting your content and liking some posts, you help transition the community.

0

2

Tom Everitt

@tom4everitt

4 months

You don't get shadowbanned by mentioning a competitor network or website, or by including a link in your post. You can choose yourself which content moderation filters to apply.

1

0

2

Tom Everitt

@tom4everitt

4 months

Starter Packs help you find people to follow on a topic. For example, there are starter packs on Technical AGI Safety, Grumpy Machine Learners, and AGI Labs. Anyone can create a starter pack. Ping the creator to get added (good way to get followers!).

1

0

1

Tom Everitt

@tom4everitt

4 months

Instead, there's a market place of content selection algorithms. My favourites are.* "Following": simple chronological feed (default).* "Quiet posters": posts from less frequent posters in your feed.* "Paper Skygest": posts about papers.

1

0

1

Tom Everitt

@tom4everitt

4 months

BlueSky is an alternative social media network, that is open-source and decentralised. It's super easy to set up. And no one can manipulate the content you see by tweaking the content selection algorithm 🧵.

3

2

7