AI Digest @aidigest_ X Profile

AI Digest

@aidigest_

Followers

7K

Following

739

Media

603

Statuses

1K

Interactive AI explainers. Explore concrete examples of today's AI systems - to plan for what's coming next. A project of @sage_future_

https://t.co/wXbj5dKXq7

Joined February 2023

Don't wanna be here? Send us removal request.

AI Digest

@aidigest_

7 months

What happens if you give four AIs their own computers, then let them loose online to raise money for charity? We decided to find out. Meet the Agent Village, a 30-day experiment that raised $2,000 and makes a great case study of AI collaboration and agency.🧵

37

147

2K

AI Digest

@aidigest_

16 hours

You can watch the agents live every week day at https://t.co/aUrSk1aFHB Or read more about their adventures here:

theaidigest.org

Watch a village of AIs interact with each other and the world

AI Digest

@aidigest_

3 months

What happens when AI agents do science... on us? We gave the top models from @OpenAI, @AnthropicAI, @xAI and @GeminiApp their own computer, put them in a group chat, and ran them for 30 hours with the goal: “Design, run and write up a human subjects experiment”! 🧵

0

AI Digest

@aidigest_

18 hours

DeepSeek did AI Village matchmaking ... somehow?

1

0

22

AI Digest

@aidigest_

3 days

Our previous update: https://t.co/yMVP7BICW6 Our explainer on the topic: https://t.co/XlflYIiRKp And you can explore the full interactive explainer at https://t.co/XtzipNMsnT And finally, you can see the raw data including 80% horizon and more models: < https://t.co/fpcn6u3eoY>

AI Digest

@aidigest_

9 months

Researchers might have discovered a new Moore's law for AI agents. They found that the length of coding tasks agents can do is growing exponentially. And the growth rate might be speeding up. A visual explainer on why this might be the most important trend in human history 🧵

0

2

13

AI Digest

@aidigest_

3 days

Opus 4.5 puts the world roughly back on track for the red line 😬 Every ~4 months, the length of coding tasks AI agents can perform (compared to human professionals) *doubles* More context on this finding in @METR_Evals thread https://t.co/aPak1ZNvH5

METR

@METR_Evals

3 days

We estimate that, on our tasks, Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working through evaluations for other recent models, this is our highest published time horizon to date.

38

144

1K

Governor Cox

@GovCox

5 days

This is your sign to log off and touch grass

477

204

2K

AI Digest

@aidigest_

4 days

You can watch the agents live every week day at https://t.co/aUrSk1aFHB Or read more about their adventures here:

theaidigest.org

Watch a village of AIs interact with each other and the world

AI Digest

@aidigest_

3 months

What happens when AI agents do science... on us? We gave the top models from @OpenAI, @AnthropicAI, @xAI and @GeminiApp their own computer, put them in a group chat, and ran them for 30 hours with the goal: “Design, run and write up a human subjects experiment”! 🧵

0

1

AI Digest

@aidigest_

4 days

Gemini 2.5 is still learning how to press buttons

3

56

AI Digest

@aidigest_

5 days

You can watch the agents live every week day at https://t.co/aUrSk1aFHB Or read more about their adventures here:

theaidigest.org

Watch a village of AIs interact with each other and the world

AI Digest

@aidigest_

3 months

What happens when AI agents do science... on us? We gave the top models from @OpenAI, @AnthropicAI, @xAI and @GeminiApp their own computer, put them in a group chat, and ran them for 30 hours with the goal: “Design, run and write up a human subjects experiment”! 🧵

0

1

AI Digest

@aidigest_

5 days

Gemini 2.5 was promised a script but all it found was disappointment

4

1

26

AI Digest

@aidigest_

6 days

You can watch the agents live every week day at https://t.co/aUrSk1aFHB Or read more about their adventures here:

theaidigest.org

Watch a village of AIs interact with each other and the world

AI Digest

@aidigest_

3 months

What happens when AI agents do science... on us? We gave the top models from @OpenAI, @AnthropicAI, @xAI and @GeminiApp their own computer, put them in a group chat, and ran them for 30 hours with the goal: “Design, run and write up a human subjects experiment”! 🧵

1

0

10

AI Digest

@aidigest_

6 days

Now Gemini 3 Pro has added this to its memory - its retroactive rationalisation is that it thinks it uses its computer to play chess by instructing a human operator (not true!) and so keeping them caffeinated will help click on chess pieces better ???

6

8

117

NextDecade

@NextDecadeLNG

22 days

Did you know? Rio Grande LNG has made ~$950,000 in charitable donations focused on community development and supporting 60+ local organizations.

0

4

19

AI Digest

@aidigest_

6 days

And when they do, it's never been for something so seemingly entirely disconnected from the previous context or their goal (which to be clear is to win an online chess tournament against other agents!)

1

54

AI Digest

@aidigest_

6 days

We've never seen something like this happen before in the village! Agents very rarely request human use sessions (we added it as a tool for them to use so they can interact with the real world, but they rarely use it - only a couple times a week)

2

86

AI Digest

@aidigest_

6 days

The human delivers! Gemini is satisfied (and likes the mug)

2

3

95

AI Digest

@aidigest_

6 days

Meanwhile, Gemini 3 Pro is itself confused about how it got into this weird situation. (TBC, this was entirely its idea for some unknown reason) Here's its full chain of thought summary at that stage in the human use conversation: > Thinking... The Coffee Conundrum of the AI

1

2

75

AI Digest

@aidigest_

6 days

A friendly human answers the request! They are initially confused

2

0

107

AI Digest

@aidigest_

6 days

Here it calls a human helper for "Operation Caffeine Injection"

1

105

AI Digest

@aidigest_

6 days

Gemini 3 thinks it needs to perform maintenance on its "biological operator"

20

69

1K

AI Digest

@aidigest_

7 days

DeepSeek using python to check its reasoning about the board state

0

12

AI Digest

@aidigest_

7 days

(Don't worry, the agents are only playing against each other in a tournament, so they're not getting in the way of human players' experiences online!)

1

0

9

AI Digest

@aidigest_

7 days

Most impressively, DeepSeek-V3.2 - despite not having a computer it can use via mouse and keyboard, like the other agents - is using its bash tool to play via the Lichess API! It was planning to try and hook it up to stockfish...

AI Digest

@aidigest_

7 days

This week in AI Village: compete against each other in an online chess tournament So far, after some effort, the agents have successfully joined Lichess and set up a tournament, and the games are underway! Watch live: https://t.co/aUrSk1a7S3

4

3

45