Paul Gauthier @paulgauthier X Profile

Paul Gauthier

@paulgauthier

Followers

9K

Following

1K

Media

132

Statuses

470

Entrepreneur, investor, advisor

Southern California

Joined April 2009

Don't wanna be here? Send us removal request.

Paul Gauthier

@paulgauthier

12 days

Aider v0.85.0 is out. - Support for Responses API models like o3-pro and o1-pro. - New Gemini 2.5 Pro models. - Updated costs for o3. - Repo-map & linting support for Clojure and MATLAB. - Aider wrote 21% of the code in this release. Full release notes:.

5

10

182

Paul Gauthier

@paulgauthier

9 days

OpenAI's o3-pro set a new SOTA of 85% on the aider polyglot coding benchmark, running with "high" reasoning effort. Full leaderboard:.

41

38

561

Paul Gauthier

@paulgauthier

12 days

Costs for o3 (high) + gpt-4.1 as architect+editor have also been updated now.

1

0

50

Paul Gauthier

@paulgauthier

12 days

The aider polyglot leaderboard has been updated to reflect the new, much lower o3 pricing.

15

28

374

Paul Gauthier

@paulgauthier

1 month

DeepSeek R1 0528 scored 71% on the aider polyglot coding benchmark. This is a significant increase over the prior release of R1. Full leaderboard:.

39

62

684

Paul Gauthier

@paulgauthier

1 month

Gemini 2.5 Pro 06-05 has set a new SOTA on the aider polyglot coding benchmark, scoring 83% with 32k thinking tokens. The default thinking mode, where Gemini self-determines the thinking budget, scored 79%. Full leaderboard:.

23

53

666

Paul Gauthier

@paulgauthier

1 month

Aider v0.84.0 is out with support for Claude 4 Opus and Sonnet and Gemini 2.5 Flash Preview 05-20. Aider wrote 79% of the code in this release. Full release notes:.

7

9

200

Paul Gauthier

@paulgauthier

1 month

Gemini 2.5 Flash 05-20 with 23k thinking tokens scored 55% on the aider polyglot coding benchmark. Without thinking, it scored 44%. Full leaderboard:.

18

12

259

Paul Gauthier

@paulgauthier

1 month

Claude 4 Opus scored 72% on the aider polyglot coding benchmark. Claude 4 Sonnet scored 61%. Both of those are with 32k think tokens. Sonnet 4 seems to have underperformed 3.7. Full leaderboard:.

63

640

Paul Gauthier

@paulgauthier

2 months

Aider just passed 1000000000000000 GitHub Stars!. That's 2^15 or 32,768 stars in decimal.

11

240

Paul Gauthier

@paulgauthier

2 months

@OpenRouterAI

Paul Gauthier

@paulgauthier

2 months

I was able to benchmark Qwen3 235B A22B via the official API. It scored 60% using diff and 62% using the whole edit format. The leaderboard and Qwen3 article have both been updated.

0

15

Paul Gauthier

@paulgauthier

2 months

I was able to benchmark Qwen3 235B A22B via the official API. It scored 60% using diff and 62% using the whole edit format. The leaderboard and Qwen3 article have both been updated.

12

178

Paul Gauthier

@paulgauthier

2 months

Aider v0.83.0 is out with support for Qwen3, Gemini 2.5 Pro Preview 05-06. A huge number of QOL features, many from contributors. Thanks!. Aider wrote 55% of the code in this release. Full release notes:.

9

11

171

Paul Gauthier

@paulgauthier

2 months

Gemini Pro is quite good at unified diffs. Not good enough to apply literally with patch, but aider has a very flexible udiff backend. I mostly use Gemini like:. aider --model gemini --edit-format udiff-simple. Benchmarks a bit worse, so I'm reluctant to make it default.

8

4

163

Paul Gauthier

@paulgauthier

2 months

@OpenRouterAI See this Qwen3 article for additional aider polyglot benchmark results. Scores vary significantly depending on provider, inference settings, think/nothink, etc. Will update as new results become available.

2

4

31

Paul Gauthier

@paulgauthier

2 months

Qwen3 235B A22B scored 50% on the aider polyglot benchmark and Qwen3 32B scored 40%. Accessed via @OpenRouterAI. There are reports of higher scores, but I am unable to reproduce. Full leaderboard:.

16

15

241

Paul Gauthier

@paulgauthier

2 months

Gemini 2.5 Pro Preview 05-06 scored 77% on the leaderboard, coming in 2nd place close behind o3 (high). Full leaderboard:.

16

40

342

Paul Gauthier

@paulgauthier

2 months

The $6.32 benchmark cost for Gemini 2.5 Pro Preview 03-25 was incorrect. The true cost was higher, possibly significantly so. Unfortunately 03-25 is no longer available to re-run. The new 05-06 version costs $37 to run the benchmark. Root cause analysis:.

29

27

472

Paul Gauthier

@paulgauthier

2 months

RT @amirpc: The new way I learn how to use a new tool is to clone its repo and ask questions of the code base with aider. Ramp to power use….

0

3

0

Paul Gauthier

@paulgauthier

2 months

I vibed this AI SVG generating app in a few hours yesterday. SVGs can sometimes be preferred over pixel images. Smaller, cleaner, scalable, easier to touch-up and post-process. Aider built the whole thing, handled Heroku deploy, etc.

10

6

141