Rishi Mehta
@rishicomplex
Followers
3K
Following
4K
Media
37
Statuses
254
Solve i̶n̶t̶e̶l̶l̶i̶g̶e̶n̶c̶e̶ ̶ coding, use it to solve everything else | Research @AnthropicAI | Past: RL @GoogleDeepmind: AlphaProof co-lead, Gemini.
San Francisco, CA
Joined July 2009
Btw one of the biggest perks of working at A\ is unlimited Claude Code :)
5
0
42
We're hiring on the Code RL team at Anthropic! Small, fast-moving team. Low ego, high impact. If you're a star engineer/researcher excited to push the frontier of AI-powered SWE, there's nowhere better to be. We care about getting this right. DM or apply here!
job-boards.greenhouse.io
San Francisco, CA | New York City, NY
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
20
20
510
We reported a broader suite of SWE evals with Opus 4.5 (swebench, swe bench pro, swebench multilingual, aider). But as always, actually using the model is the real eval
3
0
29
A litmus test I have when I work with a model is "frequency at which I feel the urge to swear". Haven't felt it yet with Opus 4.5.
0
1
1
We launched a new Opus! Try coding with it. Besides being a jump on all the benchmarks, this model feels competent in a way I haven't felt with any other model.
Introducing Claude Opus 4.5: the best model in the world for coding, agents, and computer use. Opus 4.5 is a step forward in what AI systems can do, and a preview of larger changes to how work gets done.
1
1
54
The number of papers I read is proportional to the number of hours I spend in flights
1
0
5
In the arc agi 2 human baseline of 60%, it looks like humans were given 2 attempts, with the second attempt conditioned on knowing the first attempt failed. In the AI attempts, the two attempts are independent. Seems like an important disparity? @fchollet @mikeknoop
3
0
28
Oh.
@TheVixhal your post challenged me. every one of your points is wrong but i had to think about each for a while :)
0
0
9
Asked nano banana pro to make a comic about suppandi (simpleton Indian comic book character I grew up on). Pretty good - this is legit something I could imagine being published in tinkle lol
1
0
5
Note how the other models don't even get the board size right let alone understanding chess
1
0
0
Wow, nano banana pro is amazing. Prompt: Make an image of an interesting mate-in-1 chess puzzle. Make the aspect ratio such that it's easy to view on a phone.
8
1
8
Looking to move to a new share house in sf in early jan, dm me if you have any leads!
1
8
15
Btw it's hidden away in the appendix of the alphaproof paper but we solved minif2f-valid (canonical theorem proving benchmark) last year! Doesn't happen with most benchmarks even after they're saturated, because a few problems in the end prove too hard / ambiguous / unsolvable.
2
6
26
Nice article that reports the results of mathematicians trying AlphaProof on some real problems, and sometimes finding it useful
New from me, in @Nature: Mathematicians put AI model AlphaProof to the test. A solicited News & Views article about @GoogleDeepMind AlphaProof that was an absolute joy to write! https://t.co/HBnQg22MEP
0
1
8