Tom Everitt Profile
Tom Everitt

@tom4everitt

Followers
2K
Following
10K
Media
14
Statuses
220

AGI safety researcher at @GoogleDeepMind, leading https://t.co/gBAjHPCTcL switching to https://t.co/aIojqeDT0I

London
Joined August 2017
Don't wanna be here? Send us removal request.
@tom4everitt
Tom Everitt
14 hours
pretty cool effort to distribute economic power in the age of powerful AI.
@LRudL_
Rudolf Laine
16 hours
With @luke_drago_, I’m cofounding Workshop Labs, a public benefit corporation preventing human disempowerment from AI. See below for:.-impact case.-what we’re building.-what we hope the future looks like.-what we’re hiring for.
0
0
3
@tom4everitt
Tom Everitt
1 month
RT @MichaelD1729: Someone needs to use this as the basis of an unsupervised environment design algorithm to give AI designers direct contro….
0
2
0
@tom4everitt
Tom Everitt
1 month
Causality is about predicting how interventions affect outcomes. Can we use causality to predict how environment changes affect agent behavior?. We explore this idea in a new paper.
@alexis_bellot_
Alexis Bellot
1 month
Can we trust a black-box system, when all we know is its past behaviour? 🤖🤔.In a new #ICML2025 paper we derive fundamental bounds on the predictability of black-box agents. This is a critical question for #AgentSafety. 🧵
Tweet media one
2
1
20
@tom4everitt
Tom Everitt
1 month
interestingly, task generality doesn't require a causal model, and so is less demanding than robustness in this sense .
0
1
3
@tom4everitt
Tom Everitt
1 month
new world models paper, this time with task-generality rather than robustness.
@jonathanrichens
Jon Richens
1 month
Are world models necessary to achieve human-level agents, or is there a model-free short-cut?.Our new #ICML2025 paper tackles this question from first principles, and finds a surprising answer, agents _are_ world models… 🧵
Tweet media one
1
1
16
@tom4everitt
Tom Everitt
3 months
1
1
6
@tom4everitt
Tom Everitt
3 months
We hope our measure of goal-directedness will help designers be more deliberate about what agentic properties they instill in their AI systems, to maximize utility while minimizing risks.
0
1
8
@tom4everitt
Tom Everitt
3 months
Our definition also predicts other indicators of goal-directedness, such as propensity to rebuild a fallen tower, and persistence at Progress Ratio Tasks. (In PRTs, you can quit at any time, and face diminishing returns. They are commonly used to assess human goal-directedness.)
Tweet media one
Tweet media two
2
0
10
@tom4everitt
Tom Everitt
3 months
Notably, goal-directedness is fairly consistent across tasks, indicating that it may be an intrinsic property of models. Newer models are often more goal-directed, perhaps due to better RL post-training.
Tweet media one
1
0
9
@tom4everitt
Tom Everitt
3 months
We develop a suite of tasks and subtasks in a blocksworld environment that we open source with the paper The tasks cover information gathering, cognitive effort, and plan execution. We test Gemini, GPT, and Claude versions.
1
0
6
@tom4everitt
Tom Everitt
3 months
Previous work on goal-directedness in AI safety have defined it as propensity to reach a goal. This conflates it with capabilities. Here, we instead measure whether LLMs use their capabilities towards a given goal. Sometimes, more capable models are less goal-directed.
1
0
11
@tom4everitt
Tom Everitt
3 months
Goal-directedness has long been recognized as an important property. At its best, it enables autonomy. At its worst, it promotes unethical means and reward hacking, like sycophancy, preference manipulation, inappropriate relationships, resource acquisition, shutdown resistance.
1
0
12
@tom4everitt
Tom Everitt
3 months
What if LLMs are sometimes capable of doing a task but don't try hard enough to do it?. In a new paper, we use subtasks to assess capabilities. Perhaps surprisingly, LLMs often fail to fully employ their capabilities, i.e. they are not fully *goal-directed* 🧵
Tweet media one
22
46
238
@tom4everitt
Tom Everitt
4 months
RT @F_Rhys_Ward: In real-life, agents with different subjective beliefs interact in a shared objective reality. They have higher-order beli….
0
12
0
@tom4everitt
Tom Everitt
4 months
One thing that I really like about this is that my content is much less determined by who I follow, than by which posts I like. This means I can express my approval for a post, without worrying that similar content will now flood my feed.
@tom4everitt
Tom Everitt
4 months
Instead, there's a market place of content selection algorithms. My favourites are.* "Following": simple chronological feed (default).* "Quiet posters": posts from less frequent posters in your feed.* "Paper Skygest": posts about papers.
0
0
1
@tom4everitt
Tom Everitt
4 months
It's growing fast, so chances are that some of your favorite posters and followers are already there. By crossposting your content and liking some posts, you help transition the community.
0
0
2
@tom4everitt
Tom Everitt
4 months
You don't get shadowbanned by mentioning a competitor network or website, or by including a link in your post. You can choose yourself which content moderation filters to apply.
1
0
2
@tom4everitt
Tom Everitt
4 months
Starter Packs help you find people to follow on a topic. For example, there are starter packs on Technical AGI Safety, Grumpy Machine Learners, and AGI Labs. Anyone can create a starter pack. Ping the creator to get added (good way to get followers!).
1
0
1
@tom4everitt
Tom Everitt
4 months
Instead, there's a market place of content selection algorithms. My favourites are.* "Following": simple chronological feed (default).* "Quiet posters": posts from less frequent posters in your feed.* "Paper Skygest": posts about papers.
1
0
1
@tom4everitt
Tom Everitt
4 months
BlueSky is an alternative social media network, that is open-source and decentralised. It's super easy to set up. And no one can manipulate the content you see by tweaking the content selection algorithm 🧵.
3
2
7