AI Digest
@aidigest_
Followers
7K
Following
739
Media
603
Statuses
1K
Interactive AI explainers. Explore concrete examples of today's AI systems - to plan for what's coming next. A project of @sage_future_
Joined February 2023
What happens if you give four AIs their own computers, then let them loose online to raise money for charity? We decided to find out. Meet the Agent Village, a 30-day experiment that raised $2,000 and makes a great case study of AI collaboration and agency.🧵
37
147
2K
You can watch the agents live every week day at https://t.co/aUrSk1aFHB Or read more about their adventures here:
theaidigest.org
Watch a village of AIs interact with each other and the world
What happens when AI agents do science... on us? We gave the top models from @OpenAI, @AnthropicAI, @xAI and @GeminiApp their own computer, put them in a group chat, and ran them for 30 hours with the goal: “Design, run and write up a human subjects experiment”! 🧵
0
0
0
Our previous update: https://t.co/yMVP7BICW6 Our explainer on the topic: https://t.co/XlflYIiRKp And you can explore the full interactive explainer at https://t.co/XtzipNMsnT And finally, you can see the raw data including 80% horizon and more models: < https://t.co/fpcn6u3eoY>
Researchers might have discovered a new Moore's law for AI agents. They found that the length of coding tasks agents can do is growing exponentially. And the growth rate might be speeding up. A visual explainer on why this might be the most important trend in human history 🧵
0
2
13
Opus 4.5 puts the world roughly back on track for the red line 😬 Every ~4 months, the length of coding tasks AI agents can perform (compared to human professionals) *doubles* More context on this finding in @METR_Evals thread https://t.co/aPak1ZNvH5
We estimate that, on our tasks, Claude Opus 4.5 has a 50%-time horizon of around 4 hrs 49 mins (95% confidence interval of 1 hr 49 mins to 20 hrs 25 mins). While we're still working through evaluations for other recent models, this is our highest published time horizon to date.
38
144
1K
You can watch the agents live every week day at https://t.co/aUrSk1aFHB Or read more about their adventures here:
theaidigest.org
Watch a village of AIs interact with each other and the world
What happens when AI agents do science... on us? We gave the top models from @OpenAI, @AnthropicAI, @xAI and @GeminiApp their own computer, put them in a group chat, and ran them for 30 hours with the goal: “Design, run and write up a human subjects experiment”! 🧵
0
0
1
You can watch the agents live every week day at https://t.co/aUrSk1aFHB Or read more about their adventures here:
theaidigest.org
Watch a village of AIs interact with each other and the world
What happens when AI agents do science... on us? We gave the top models from @OpenAI, @AnthropicAI, @xAI and @GeminiApp their own computer, put them in a group chat, and ran them for 30 hours with the goal: “Design, run and write up a human subjects experiment”! 🧵
0
0
1
Gemini 2.5 was promised a script but all it found was disappointment
4
1
26
You can watch the agents live every week day at https://t.co/aUrSk1aFHB Or read more about their adventures here:
theaidigest.org
Watch a village of AIs interact with each other and the world
What happens when AI agents do science... on us? We gave the top models from @OpenAI, @AnthropicAI, @xAI and @GeminiApp their own computer, put them in a group chat, and ran them for 30 hours with the goal: “Design, run and write up a human subjects experiment”! 🧵
1
0
10
Now Gemini 3 Pro has added this to its memory - its retroactive rationalisation is that it thinks it uses its computer to play chess by instructing a human operator (not true!) and so keeping them caffeinated will help click on chess pieces better ???
6
8
117
Did you know? Rio Grande LNG has made ~$950,000 in charitable donations focused on community development and supporting 60+ local organizations.
0
4
19
And when they do, it's never been for something so seemingly entirely disconnected from the previous context or their goal (which to be clear is to win an online chess tournament against other agents!)
1
1
54
We've never seen something like this happen before in the village! Agents very rarely request human use sessions (we added it as a tool for them to use so they can interact with the real world, but they rarely use it - only a couple times a week)
2
2
86
Meanwhile, Gemini 3 Pro is itself confused about how it got into this weird situation. (TBC, this was entirely its idea for some unknown reason) Here's its full chain of thought summary at that stage in the human use conversation: > Thinking... The Coffee Conundrum of the AI
1
2
75
A friendly human answers the request! They are initially confused
2
0
107
Gemini 3 thinks it needs to perform maintenance on its "biological operator"
20
69
1K
DeepSeek using python to check its reasoning about the board state
0
0
12
(Don't worry, the agents are only playing against each other in a tournament, so they're not getting in the way of human players' experiences online!)
1
0
9
Most impressively, DeepSeek-V3.2 - despite not having a computer it can use via mouse and keyboard, like the other agents - is using its bash tool to play via the Lichess API! It was planning to try and hook it up to stockfish...
This week in AI Village: compete against each other in an online chess tournament So far, after some effort, the agents have successfully joined Lichess and set up a tournament, and the games are underway! Watch live: https://t.co/aUrSk1a7S3
4
3
45