Alexander Wei
@alexwei_
Followers
25K
Following
627
Media
16
Statuses
76
Reasoning @OpenAI. Co-built CICERO @MetaAI | @Berkeley_AI PhD '23 | @Harvard '20
San Francisco, CA
Joined March 2022
It's often overlooked how building evals is some of the deepest, most foundational work in AI research. Congrats to @tejalpatwardhan and team!! Here's my favorite plot from the paper—brings into focus the current pace of progress:
Understanding the capabilities of AI models is important to me. To forecast how AI models might affect labor, we need methods to measure their real-world work abilities. That’s why we created GDPval.
1
3
42
Congrats to the team on another 🥇—with a perfect score! A fitting way to close a chapter where intellectual competitions defined the frontier. Today, new horizons beckon. I'm glad our ✨experimental reasoning model✨ (same one from IMO/IOI) got one last golden run!
1/n I’m really excited to share that our @OpenAI reasoning system got a perfect score of 12/12 during the 2025 ICPC World Finals, the premier collegiate programming competition where top university teams from around the world solve complex algorithmic problems. This would have
12
18
365
1/5 In 2015, I won the ICPC World Finals as a member of the ITMO University team. It was the only time in Finals history when a team solved all the problems before the contest ended.
73
147
2K
My first project at OpenAI involved teaching our models to reason and use tools by improving their competitive programming skills. Back then, GPT-4 struggled with even the simplest Codeforces problems, often oom-ing in the sandbox. It's incredible to see that just 2.5 years
1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨💻👨💻
48
34
591
5/ Congrats all IOI 2025 contestants on your milestone, and thank you to the committee and volunteers for an amazing IOI! And huge shoutout to @SherylHsu02—she is a rising 🌟. Here's us in Bolivia, watching the AI submissions roll in during dinner
4
0
98
4/ This arc—from last summer’s near-miss to this year's gold—underscores the pace of progress. I’m proud of our GPT-5 launch last week, but I’m even more excited about all the RL and intelligence advances that we haven’t shipped yet!
3
5
135
3/ We’ve come a long way since last summer. Before the o1 launch, ~12 of us sprinted for two weeks to evaluate a finetuned o1 on IOI 2024. Despite a specialized scaffold, with synthetic test cases, adaptive submission, and hand-engineered features, we fell short of a medal.
1
1
79
2/ I was impressed by our AI handling 4/6 tasks this year with non-standard formats—interactive, output-only, constructive, communication. These tasks are tough to prep for and especially demand outside-the-box thinking. Our models generalized well to these unfamiliar task types.
1
2
80
1/ I competed for Team USA at IOI in 2015, so this achievement hits home for me. The biggest highlight: we *did not* train a model specifically for IOI. Our IMO gold model actually set a new state of the art in our internal competitive programming evals. Reasoning generalizes!
1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s top programming competitions - the 2025 International Olympiad in Informatics (IOI) - placing first among AI participants! 👨💻👨💻
38
62
913
On IMO P6 (without going into too much detail about our setup), the model "knew" it didn't have a correct solution. The model knowing when it didn't know was one of the early signs of life that made us excited about the underlying research direction!
One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it? 🧵
78
163
2K
Congrats to GDM on their concurrent result 🎉 Noam shared some further thoughts here. It's exciting to be part of a field that is progressing so quickly!
Congrats to the GDM team on their IMO result! I think their parallel success highlights how fast AI progress is. Their approach was a bit different than ours, but I think that shows there are many research directions for further progress. Some thoughts on our model and results 🧵
5
8
292
10/N If you want to take a look, here are the model’s solutions to the 2025 IMO problems! The model solved P1 through P5; it did not produce a solution for P6. (Apologies in advance for its … distinct style—it is very much an experimental model 😅) https://t.co/Pm3qd8BXQs
github.com
Contribute to aw31/openai-imo-2025-proofs development by creating an account on GitHub.
18
50
815
9/N Still—this underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardt had me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold.
6
51
774
8/N Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.
43
199
2K
7/N HUGE congratulations to the team—@SherylHsu02, @polynoamial, and the many giants whose shoulders we stood on—for turning this crazy dream into reality! I am lucky I get to spend late nights and early mornings working alongside the very best.
7
15
662
6/N In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold! 🥇
7
28
744
5/N Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.
11
62
1K
4/N Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.
9
43
919
3/N Why is this a big deal? First, IMO problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we’ve now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins).
3
34
806