Ankesh Anand @ankesh_anand X Profile

Ankesh Anand

@ankesh_anand

Followers

5K

Following

6K

Media

112

Statuses

1K

Research scientist @googledeepmind (Gemini Thinking & Post-Training), prev phd @milamontreal. RL for Gemini 2.5 and Project Mariner. Opinions are my own.

London, England

Joined December 2011

Don't wanna be here? Send us removal request.

Ankesh Anand

@ankesh_anand

3 months

2.5 Pro is our new frontier model: fresh big model smell with extremely strong reasoning / thinking capabilities. We report single attempt / pass@1 scores for clean comparisons.

5

7

112

Ankesh Anand

@ankesh_anand

23 days

0

1

Ankesh Anand

@ankesh_anand

23 days

this post is now more than 3 years old, i was getting a bit disillusioned with my own RL for Atari research work, had just read the WebGPT work and was like "yep, this is it"! . very happy with how all 3 intuitions behind why RLFT makes sense turn out to be true. now we are in

2

3

33

Ankesh Anand

@ankesh_anand

28 days

Here we go! A new 2.5 Pro with all around capability improvements compared to previous versions. - Much better at code editing now, sota on Aider (82.2), try out this model on cursor!.- #1 on webdev-arena (surpassing opus 4). - supports budgets now (128 to 32k).- much better at

2

5

115

Ankesh Anand

@ankesh_anand

3 months

📈📈📈

Mislav Balunović

@mbalunovic

3 months

Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.

12

28

352

Ankesh Anand

@ankesh_anand

3 months

0

4

Ankesh Anand

@ankesh_anand

3 months

MathArena results for gemini-2.5-pro

11

52

620

Ankesh Anand

@ankesh_anand

3 months

shoutout to the believers!

46

71

2K

Ankesh Anand

@ankesh_anand

5 months

4. Overall, massive congrats to the DeepSeek team on releasing R1! Making proper RL work on frontier models happened kinda simultaneously at multiple labs in 2024, and now we are in a new and fun paradigm. I hope their openness incentivizes other labs to release more stuff in the.

6

4

168

Ankesh Anand

@ankesh_anand

5 months

3. "The aha moment": I think the o1 team deserves some credit here. In this launch video, they mention emergent reasoning patterns during o1 training runs without seeded human CoT data, and refer to it as the "aha moment" as well. It's definitely easy to spot things when you

4

12

149

Ankesh Anand

@ankesh_anand

5 months

2. The 5.5M$ thing is entirely believable for "one final run", and a massive feat. It is however only surprising if you compare with llama3 training costs which was a couple generations behind as Wenfeng mentions here. Obviously, the total R&D costs are much higher.

2

8

129

Ankesh Anand

@ankesh_anand

5 months

1. Re Distillation claims: DeepSeekCoder-V2 [1] was released in June 2024, and they had RL on verifiable rewards working back then with great success. They were the only team outside of Gemini and OpenAI I knew of that were RL-pilled. 6 months from then on, I place a very low

4

11

123

Ankesh Anand

@ankesh_anand

5 months

The DeepSeek discourse is simultaneously under-crediting and over-crediting them for what they achieved. So, some quick thoughts:.

3

65

674

Ankesh Anand

@ankesh_anand

5 months

The whole surprise over 5.5M$ was because everyone is anchored to Llama3’s compute efficiency. Wenfeng himself said it’s about two generations behind frontier lab numbers. Sonnet costs “tens of millions” of dollars, I hope we release the 2.0 Flash / Flash Thinking numbers as

Dario Amodei

@DarioAmodei

5 months

My thoughts on China, export controls and two possible futures

2

5

59

Ankesh Anand

@ankesh_anand

5 months

Try it out at supports 1M context and code execution.

0

3

Ankesh Anand

@ankesh_anand

5 months

The lines keep going up and to the right, so we have a new version of gemini flash thinking out a month later!

1

3

47

Ankesh Anand

@ankesh_anand

7 months

RT @jamesjyan117153: So proud to be one of the main core contributor on this effort! Really enjoyed making Gemini smarter!!!.

0

14

0

Ankesh Anand

@ankesh_anand

7 months

The model is live in the UI and API and you can play with it on

0

8

Ankesh Anand

@ankesh_anand

7 months

Excited to share an early preview of our gemini 2.0 flash thinking model with all it's raw thoughts visible. Here's the model trying to solve a Putnam 2024 with multiple approaches, and then self-verifies that it's answer was correct.

12

31

372