Rishi Mehta @rishicomplex X Profile

Rishi Mehta

@rishicomplex

Followers

3K

Following

4K

Media

37

Statuses

254

Solve i̶n̶t̶e̶l̶l̶i̶g̶e̶n̶c̶e̶ ̶ coding, use it to solve everything else | Research @AnthropicAI | Past: RL @GoogleDeepmind: AlphaProof co-lead, Gemini.

https://t.co/TFwzc3Sq2V

San Francisco, CA

Joined July 2009

Don't wanna be here? Send us removal request.

Rishi Mehta

@rishicomplex

19 days

Btw one of the biggest perks of working at A\ is unlimited Claude Code :)

5

0

42

Rishi Mehta

@rishicomplex

19 days

We're hiring on the Code RL team at Anthropic! Small, fast-moving team. Low ego, high impact. If you're a star engineer/researcher excited to push the frontier of AI-powered SWE, there's nowhere better to be. We care about getting this right. DM or apply here!

job-boards.greenhouse.io

San Francisco, CA | New York City, NY

Claude

@claudeai

19 days

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

20

510

Rishi Mehta

@rishicomplex

19 days

We reported a broader suite of SWE evals with Opus 4.5 (swebench, swe bench pro, swebench multilingual, aider). But as always, actually using the model is the real eval

Ariel

@redtachyon

19 days

lfg we got new SOTA on django

3

0

29

Rishi Mehta

@rishicomplex

19 days

A litmus test I have when I work with a model is "frequency at which I feel the urge to swear". Haven't felt it yet with Opus 4.5.

0

1

Rishi Mehta

@rishicomplex

19 days

We launched a new Opus! Try coding with it. Besides being a jump on all the benchmarks, this model feels competent in a way I haven't felt with any other model.

Claude

@claudeai

19 days

Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.

1

54

Rishi Mehta

@rishicomplex

20 days

The number of papers I read is proportional to the number of hours I spend in flights

1

0

5

Rishi Mehta

@rishicomplex

20 days

In the arc agi 2 human baseline of 60%, it looks like humans were given 2 attempts, with the second attempt conditioned on knowing the first attempt failed. In the AI attempts, the two attempts are independent. Seems like an important disparity? @fchollet @mikeknoop

3

0

28

Rishi Mehta

@rishicomplex

22 days

I'm in

Bryan Johnson

@bryan_johnson

22 days

ok, you've had a week to think about it. who's in? no sugar nov 21 - jan 2 eat all you want: blueberries, blackberries, raspberries, strawberries, cherries, apples, kiwi, pomegranate, etc. avoid cookies, pies, cakes, sweet drinks, chocolates, cocktails, anything with added

0

Rishi Mehta

@rishicomplex

22 days

Oh.

Andrej Karpathy

@karpathy

22 days

@TheVixhal your post challenged me. every one of your points is wrong but i had to think about each for a while :)

0

9

Rishi Mehta

@rishicomplex

23 days

London folks check this out

sam mcallister

@sammcallister

23 days

I feel like we should do this again This weekend only for the last time? Air Mail Chiltern Street, London First come, first served This weekend only

2

0

12

Rishi Mehta

@rishicomplex

23 days

Asked nano banana pro to make a comic about suppandi (simpleton Indian comic book character I grew up on). Pretty good - this is legit something I could imagine being published in tinkle lol

1

0

5

Rishi Mehta

@rishicomplex

23 days

Note how the other models don't even get the board size right let alone understanding chess

1

0

Rishi Mehta

@rishicomplex

23 days

And this is nano banana (non-pro)

2

0

Rishi Mehta

@rishicomplex

23 days

This is chatgpt for the same prompt

2

0

Rishi Mehta

@rishicomplex

23 days

Wow, nano banana pro is amazing. Prompt: Make an image of an interesting mate-in-1 chess puzzle. Make the aspect ratio such that it's easy to view on a phone.

8

1

8

Rishi Mehta

@rishicomplex

26 days

This had better be good

Demis Hassabis

@demishassabis

26 days

It's nearly 3 here, my favourite part of the night shift… locked in... 💪🚀

2

0

11

Rishi Mehta

@rishicomplex

28 days

Looking to move to a new share house in sf in early jan, dm me if you have any leads!

1

8

15

Rishi Mehta

@rishicomplex

29 days

Btw it's hidden away in the appendix of the alphaproof paper but we solved minif2f-valid (canonical theorem proving benchmark) last year! Doesn't happen with most benchmarks even after they're saturated, because a few problems in the end prove too hard / ambiguous / unsolvable.

2

6

26

Rishi Mehta

@rishicomplex

1 month

"my role is to be on the bottom of things" beautiful

Gautam Kamath

@thegautamkamath

1 month

I admire Don Knuth for many things. But perhaps the most admirable is quitting email before I was even born.

0

2

Rishi Mehta

@rishicomplex

1 month

Nice article that reports the results of mathematicians trying AlphaProof on some real problems, and sometimes finding it useful

Talia Ringer 🕊

@TaliaRinger

1 month

New from me, in @Nature: Mathematicians put AI model AlphaProof to the test. A solicited News & Views article about @GoogleDeepMind AlphaProof that was an absolute joy to write! https://t.co/HBnQg22MEP

0

1

8