Ankesh Anand Profile
Ankesh Anand

@ankesh_anand

Followers
5K
Following
6K
Media
112
Statuses
1K

Research scientist @googledeepmind (Gemini Thinking & Post-Training), prev phd @milamontreal. RL for Gemini 2.5 and Project Mariner. Opinions are my own.

London, England
Joined December 2011
Don't wanna be here? Send us removal request.
@ankesh_anand
Ankesh Anand
3 months
2.5 Pro is our new frontier model: fresh big model smell with extremely strong reasoning / thinking capabilities. We report single attempt / pass@1 scores for clean comparisons.
Tweet media one
5
7
112
@ankesh_anand
Ankesh Anand
23 days
0
0
1
@ankesh_anand
Ankesh Anand
23 days
this post is now more than 3 years old, i was getting a bit disillusioned with my own RL for Atari research work, had just read the WebGPT work and was like "yep, this is it"! . very happy with how all 3 intuitions behind why RLFT makes sense turn out to be true. now we are in
Tweet media one
2
3
33
@ankesh_anand
Ankesh Anand
28 days
Here we go! A new 2.5 Pro with all around capability improvements compared to previous versions. - Much better at code editing now, sota on Aider (82.2), try out this model on cursor!.- #1 on webdev-arena (surpassing opus 4). - supports budgets now (128 to 32k).- much better at
Tweet media one
2
5
115
@ankesh_anand
Ankesh Anand
3 months
📈📈📈
Tweet media one
@mbalunovic
Mislav Balunović
3 months
Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.
Tweet media one
12
28
352
@ankesh_anand
Ankesh Anand
3 months
0
0
4
@ankesh_anand
Ankesh Anand
3 months
MathArena results for gemini-2.5-pro
Tweet media one
11
52
620
@ankesh_anand
Ankesh Anand
3 months
shoutout to the believers!
Tweet media one
46
71
2K
@ankesh_anand
Ankesh Anand
5 months
4. Overall, massive congrats to the DeepSeek team on releasing R1! Making proper RL work on frontier models happened kinda simultaneously at multiple labs in 2024, and now we are in a new and fun paradigm. I hope their openness incentivizes other labs to release more stuff in the.
6
4
168
@ankesh_anand
Ankesh Anand
5 months
3. "The aha moment": I think the o1 team deserves some credit here. In this launch video, they mention emergent reasoning patterns during o1 training runs without seeded human CoT data, and refer to it as the "aha moment" as well. It's definitely easy to spot things when you
4
12
149
@ankesh_anand
Ankesh Anand
5 months
2. The 5.5M$ thing is entirely believable for "one final run", and a massive feat. It is however only surprising if you compare with llama3 training costs which was a couple generations behind as Wenfeng mentions here. Obviously, the total R&D costs are much higher.
Tweet media one
2
8
129
@ankesh_anand
Ankesh Anand
5 months
1. Re Distillation claims: DeepSeekCoder-V2 [1] was released in June 2024, and they had RL on verifiable rewards working back then with great success. They were the only team outside of Gemini and OpenAI I knew of that were RL-pilled. 6 months from then on, I place a very low
Tweet media one
4
11
123
@ankesh_anand
Ankesh Anand
5 months
The DeepSeek discourse is simultaneously under-crediting and over-crediting them for what they achieved. So, some quick thoughts:.
3
65
674
@ankesh_anand
Ankesh Anand
5 months
The whole surprise over 5.5M$ was because everyone is anchored to Llama3’s compute efficiency. Wenfeng himself said it’s about two generations behind frontier lab numbers. Sonnet costs “tens of millions” of dollars, I hope we release the 2.0 Flash / Flash Thinking numbers as
Tweet media one
@DarioAmodei
Dario Amodei
5 months
My thoughts on China, export controls and two possible futures
2
5
59
@ankesh_anand
Ankesh Anand
5 months
Try it out at supports 1M context and code execution.
0
0
3
@ankesh_anand
Ankesh Anand
5 months
The lines keep going up and to the right, so we have a new version of gemini flash thinking out a month later!
Tweet media one
1
3
47
@ankesh_anand
Ankesh Anand
7 months
RT @jamesjyan117153: So proud to be one of the main core contributor on this effort! Really enjoyed making Gemini smarter!!!.
0
14
0
@ankesh_anand
Ankesh Anand
7 months
The model is live in the UI and API and you can play with it on
0
0
8
@ankesh_anand
Ankesh Anand
7 months
Excited to share an early preview of our gemini 2.0 flash thinking model with all it's raw thoughts visible. Here's the model trying to solve a Putnam 2024 with multiple approaches, and then self-verifies that it's answer was correct.
12
31
372