EpochAIResearch Profile Banner
Epoch AI Profile
Epoch AI

@EpochAIResearch

Followers
24K
Following
417
Media
354
Statuses
1K

Investigating the trajectory of AI for the benefit of society.

Joined May 2022
Don't wanna be here? Send us removal request.
@EpochAIResearch
Epoch AI
1 month
Our AI Supercomputers database is now live! .Explore specs for 700+ training & inference clusters—compute, chip types, power draw, ownership, location, and more.
Tweet media one
4
53
292
@EpochAIResearch
Epoch AI
3 days
You can see the results for Gemini 2.5 Pro and other models in our benchmarking dashboard, and read more about our Gemini 2.5 evaluation protocol in the methodology section.
0
0
11
@EpochAIResearch
Epoch AI
3 days
We are now evaluating Gemini 2.5 models with custom scoring rules: requests are retried up to 10 times, with exponential backoff. If all 10 retries fail, the corresponding sample is marked as incorrect. 7-9% of FrontierMath questions were marked incorrect this way.
Tweet media one
1
0
17
@EpochAIResearch
Epoch AI
3 days
Between March and June, we faced issues with the Gemini 2.5 API. Some requests were systematically failing, either returning a status code of 500, or failing to send any data at all. In June, Google staff recommended that we find a workaround for the evaluation.
1
0
11
@EpochAIResearch
Epoch AI
3 days
We’ve adopted special scoring rules to evaluate the Gemini 2.5 models, addressing the API issues we faced in previous attempts. Under this protocol, Gemini 2.5 Pro solves 11% (±2%) of FrontierMath, making it the best non-OpenAI model we’ve evaluated so far.
Tweet media one
4
5
106
@EpochAIResearch
Epoch AI
4 days
@ansonwhho @ardenaberg Read @ardenaberg and @ansonwhho’s full analysis of a possible AI Manhattan Project
0
1
16
@EpochAIResearch
Epoch AI
4 days
@ansonwhho @ardenaberg Nonetheless, the authors conclude that a meaningful acceleration due to nationalization would be unwise to write off.
1
2
15
@EpochAIResearch
Epoch AI
4 days
@ansonwhho @ardenaberg All of this is far from inevitable, especially because of serial time bottlenecks and the need for strong political buy-in. e.g. government inefficiencies could substantially increase costs, reducing the size of the largest training run at Manhattan project-sized budgets.
1
1
11
@EpochAIResearch
Epoch AI
4 days
@ansonwhho @ardenaberg This much compute in 2027 would require 7.4 GW, more than the average power usage of New York City. A Manhattan Project could likely support this by gathering already planned new capacity, especially given DPA authority.
Tweet media one
1
1
14
@EpochAIResearch
Epoch AI
4 days
@ansonwhho @ardenaberg With this much investment, the project could support a training run of 3e29 FLOP in 2027. This is roughly a 1000x increase relative to the largest models of today, and 2 years earlier than existing trends indicate.
Tweet media one
1
2
21
@EpochAIResearch
Epoch AI
4 days
@ansonwhho @ardenaberg Previous national projects at their peaks spent an equivalent fraction of GDP as $120B-$250B today. The authors find that such a budget could centralize most NVIDIA compute in the US.
Tweet media one
2
1
20
@EpochAIResearch
Epoch AI
4 days
@ansonwhho @ardenaberg A national AI project has become more and more of a possibility in the last year, with one as the top recommendation from a US-China congressional commission.
Tweet media one
1
2
18
@EpochAIResearch
Epoch AI
4 days
What would a Manhattan Project for AI look like?. @ansonwhho and @ardenaberg argue that if one reaches the scale of previous national projects, an AI Manhattan project could result in a ~1000x compute scaleup by 2027.
Tweet media one
9
36
206
@EpochAIResearch
Epoch AI
5 days
Found a model we don't have yet? Reply below or email us at data@epoch.ai!.
0
0
3
@EpochAIResearch
Epoch AI
5 days
The UK has fallen behind, while China has rapidly caught up 🇨🇳🇬🇧 . London-based DeepMind was an early leader and developed several landmark models. Since their merger with Google Brain, the UK's lead has vanished, with only 4 new large-scale models primarily developed there.
Tweet media one
1
3
15
@EpochAIResearch
Epoch AI
5 days
We're constantly adding more models to our database, including 10-20 new large-scale models per month. Check out the latest data:.
1
0
6
@EpochAIResearch
Epoch AI
5 days
These 418 models include 241 with known or estimated compute. It's worth noting that there is considerable uncertainty -- for 177 others, we don't have compute estimates, but we do have evidence suggesting their compute exceeds 10^23 FLOP.
1
0
5
@EpochAIResearch
Epoch AI
5 days
Which models are included in our data?. We track models found in Google searches, press releases, arenas, and leaderboards, as well as from machine learning papers with over 1000 citations. Citations have a lag, so it’s likely that many more models were released in 2024 and 2025.
1
0
9
@EpochAIResearch
Epoch AI
5 days
Given their small size, we've concluded it's unlikely that the mini versions of o1, o3, and o4 were trained with >10^25 FLOP. We've now removed those models from our list. So far, we've found another 31 models where we can't rule out a 10^25 FLOP training run.
2
0
10
@EpochAIResearch
Epoch AI
5 days
In March 2023, GPT-4 was released as the first model trained on over 10^25 floating-point operations of compute. Two years later, we have identified 33 AI models trained at this scale or greater.
Tweet media one
1
2
14
@EpochAIResearch
Epoch AI
5 days
China, on the other hand, is now in clear second place. Of these 418 models, China has released 151. Perhaps most notable was DeepSeek V3, which outperformed Llama 3.1 405B despite using about ten times less training compute.
Tweet media one
2
0
14