Paul Gauthier Profile
Paul Gauthier

@paulgauthier

Followers
9K
Following
1K
Media
132
Statuses
470

Entrepreneur, investor, advisor

Southern California
Joined April 2009
Don't wanna be here? Send us removal request.
@paulgauthier
Paul Gauthier
12 days
Aider v0.85.0 is out. - Support for Responses API models like o3-pro and o1-pro. - New Gemini 2.5 Pro models. - Updated costs for o3. - Repo-map & linting support for Clojure and MATLAB. - Aider wrote 21% of the code in this release. Full release notes:.
5
10
182
@paulgauthier
Paul Gauthier
9 days
OpenAI's o3-pro set a new SOTA of 85% on the aider polyglot coding benchmark, running with "high" reasoning effort. Full leaderboard:.
Tweet media one
41
38
561
@paulgauthier
Paul Gauthier
12 days
Costs for o3 (high) + gpt-4.1 as architect+editor have also been updated now.
Tweet media one
1
0
50
@paulgauthier
Paul Gauthier
12 days
The aider polyglot leaderboard has been updated to reflect the new, much lower o3 pricing.
Tweet media one
15
28
374
@paulgauthier
Paul Gauthier
1 month
DeepSeek R1 0528 scored 71% on the aider polyglot coding benchmark. This is a significant increase over the prior release of R1. Full leaderboard:.
Tweet media one
39
62
684
@paulgauthier
Paul Gauthier
1 month
Gemini 2.5 Pro 06-05 has set a new SOTA on the aider polyglot coding benchmark, scoring 83% with 32k thinking tokens. The default thinking mode, where Gemini self-determines the thinking budget, scored 79%. Full leaderboard:.
Tweet media one
23
53
666
@paulgauthier
Paul Gauthier
1 month
Aider v0.84.0 is out with support for Claude 4 Opus and Sonnet and Gemini 2.5 Flash Preview 05-20. Aider wrote 79% of the code in this release. Full release notes:.
7
9
200
@paulgauthier
Paul Gauthier
1 month
Gemini 2.5 Flash 05-20 with 23k thinking tokens scored 55% on the aider polyglot coding benchmark. Without thinking, it scored 44%. Full leaderboard:.
Tweet media one
18
12
259
@paulgauthier
Paul Gauthier
1 month
Claude 4 Opus scored 72% on the aider polyglot coding benchmark. Claude 4 Sonnet scored 61%. Both of those are with 32k think tokens. Sonnet 4 seems to have underperformed 3.7. Full leaderboard:.
Tweet media one
63
63
640
@paulgauthier
Paul Gauthier
2 months
Aider just passed 1000000000000000 GitHub Stars!. That's 2^15 or 32,768 stars in decimal.
Tweet media one
11
11
240
@paulgauthier
Paul Gauthier
2 months
@paulgauthier
Paul Gauthier
2 months
I was able to benchmark Qwen3 235B A22B via the official API. It scored 60% using diff and 62% using the whole edit format. The leaderboard and Qwen3 article have both been updated.
Tweet media one
0
0
15
@paulgauthier
Paul Gauthier
2 months
I was able to benchmark Qwen3 235B A22B via the official API. It scored 60% using diff and 62% using the whole edit format. The leaderboard and Qwen3 article have both been updated.
Tweet media one
12
12
178
@paulgauthier
Paul Gauthier
2 months
Aider v0.83.0 is out with support for Qwen3, Gemini 2.5 Pro Preview 05-06. A huge number of QOL features, many from contributors. Thanks!. Aider wrote 55% of the code in this release. Full release notes:.
9
11
171
@paulgauthier
Paul Gauthier
2 months
Gemini Pro is quite good at unified diffs. Not good enough to apply literally with patch, but aider has a very flexible udiff backend. I mostly use Gemini like:. aider --model gemini --edit-format udiff-simple. Benchmarks a bit worse, so I'm reluctant to make it default.
8
4
163
@paulgauthier
Paul Gauthier
2 months
@OpenRouterAI See this Qwen3 article for additional aider polyglot benchmark results. Scores vary significantly depending on provider, inference settings, think/nothink, etc. Will update as new results become available.
Tweet media one
2
4
31
@paulgauthier
Paul Gauthier
2 months
Qwen3 235B A22B scored 50% on the aider polyglot benchmark and Qwen3 32B scored 40%. Accessed via @OpenRouterAI. There are reports of higher scores, but I am unable to reproduce. Full leaderboard:.
Tweet media one
16
15
241
@paulgauthier
Paul Gauthier
2 months
Gemini 2.5 Pro Preview 05-06 scored 77% on the leaderboard, coming in 2nd place close behind o3 (high). Full leaderboard:.
Tweet media one
16
40
342
@paulgauthier
Paul Gauthier
2 months
The $6.32 benchmark cost for Gemini 2.5 Pro Preview 03-25 was incorrect. The true cost was higher, possibly significantly so. Unfortunately 03-25 is no longer available to re-run. The new 05-06 version costs $37 to run the benchmark. Root cause analysis:.
29
27
472
@paulgauthier
Paul Gauthier
2 months
RT @amirpc: The new way I learn how to use a new tool is to clone its repo and ask questions of the code base with aider. Ramp to power use….
0
3
0
@paulgauthier
Paul Gauthier
2 months
I vibed this AI SVG generating app in a few hours yesterday. SVGs can sometimes be preferred over pixel images. Smaller, cleaner, scalable, easier to touch-up and post-process. Aider built the whole thing, handled Heroku deploy, etc.
Tweet media one
10
6
141