Codebuff @CodebuffAI X Profile

Codebuff

@CodebuffAI

Followers

2K

Following

1K

Media

11

Statuses

281

Make your terminal write code for you: npm i -g codebuff

https://t.co/TCIoa94Z3X

Your terminal

Joined October 2024

Don't wanna be here? Send us removal request.

James Grugett

@jahooma

16 hours

We run multiple models in parallel and have a selector agent choose the best code output. However, Opus has a strong bias toward choosing code generated by Opus. So we stopped including other models, because Opus is racist 😂

James Grugett

@jahooma

16 hours

In the next iteration of @CodebuffAI's best-of-n: - Selector agent sees the alternative implementations by git patch lines added/removed instead of str_replace tool calls - Selector agent picks the best, but passes on suggested improvements based on the other impls

1

5

James Grugett

@jahooma

5 days

Our old coding agent evals ("BuffBench") are too easy for Opus 4.5. So I'm generating new ones. We do this entirely autonomously: - An agent looks through open source commits - Picks all the big beefy ones requiring skill to replicate - For each commit, generates a high-level

2

1

21

James Grugett

@jahooma

6 days

Absolutely beautiful when an agent wraps up its task by: - spawning 4 (!) parallel agents - runs typechecks and tests - also does a mini-code review And, the code reviewer spots a bug, and the orchestrator fixes it! And all that took like 7 seconds.

11

14

251

James Grugett

@jahooma

19 days

We're testing which models to spawn for our best-of-n editing: On 62 tasks, Opus never chose Gemini's edits. It always chose edits by Opus or GPT-5 as the best. In contrast, for our normal mode using Sonnet and Gemini, Gemini's edits were chosen the most by Sonnet.

1

2

11

James Grugett

@jahooma

22 days

It's a good time to retry @CodebuffAI 🔥

4

1

10

James Grugett

@jahooma

22 days

@desugar_64 @CodebuffAI @opencode @charmcli Codebuff crushes (pun intended!) these alternate coding agents on almost any eval. We're geared it to be as performant as possible. 1. Best code output 2. Faster 3. Comes with a composable agent framework, so you can create subagents for Codebuff to use

1

James Grugett

@jahooma

23 days

We've integrated Gemini 3 into Codebuff! It now appears as one of the best-of-n editors, and it has taken over code review duty! Gemini 3 is very smart, but because it thinks before each step, it's slower at gathering context, and separately, doesn't choose to read as many

9

4

68

James Grugett

@jahooma

23 days

The new Codebuff is truly cooking. We're relaunching soon, with the most performant coding agent! npm i -g codebuff

19

7

122

Camel Roxo

@camelroxo1

1 month

wtf @CodebuffAI is crazy good bro, what did these guys do to be so good? lol, it had been a long time since I'd felt so impressed with the capabilities of an AI tool. even though the price is a bit steep for me being poor, it's worth it.

0

1

2

Codebuff

@CodebuffAI

1 month

💪

AICodeKing

@aicodeking

2 months

MiniMax M2 + Claude Code on KingBench Agentic Evaluations: It now scores #2 on my Agentic Evaluations beating GLM-4.6 by a wide margin. It seems to work much better with Claude Code's Tools. Really great model and it's my daily driver now. I haven't tested GLM with CC yet.

2

11

James Grugett

@jahooma

2 months

Funny enough, we built a benchmark with an agent doing back-and-forth prompting for a coding agent. But I'm rewriting it as we speak it to remove this prompting agent haha Main reason is efficiency: we're on a budget so we need the most signal per dollar spent on evals (we

dax

@thdxr

2 months

one challenging thing with llm benchmarks is they're single prompt in real life you go back and forth with LLM, guide it, work together but this is hard to turn into an automated benchmark

3

16

James Grugett

@jahooma

2 months

Try `codebuff --experimental` for our cutting-edge agent!

Bessi

@aeitroc

2 months

@jahooma @brandonkachen codebuff --experimental: the agents have been coding for 30min a pretty difficult multi-layer plan. I'm mindblown from the details presented by the agents while are implementing it....and this is not even the final version of codebuff.

1

9

James Grugett

@jahooma

2 months

Finally got the "Orchestrator" pattern to produce higher results on our evals! The world doesn't know what's coming...

James Grugett

@jahooma

2 months

Most YC startups solve an immediate business need and climb a gradient based on customer feedback. We're not really doing that. Instead, we're building our vision of the future. It's a much worse strategy -- most of the time. Occasionally, it really works.

6

3

35

James Grugett

@jahooma

3 months

Sonnet 4.5 is live in @CodebuffAI! - It's much more concise - It's good at doing multi-file edits in a single response, making it faster & cheaper - Early eval results show a bump -- more on this soon!

0

1

16

James Grugett

@jahooma

3 months

I just searched "codebuff". And... Claude Code is paying for the first result? Wow.

12

2

57

dex

@dexhorthy

3 months

kinda unhinged that none of the agent frameworks that were hot a year ago are used in any of the top coding agents and now @opencode @CodebuffAI @claude_code are all adding agent-framework-y things... at sprout we used to call this "refactor AFTER patterns emerge". its a good

23

15

244

James Grugett

@jahooma

3 months

Announcing the Agent Store! It's npm for agents: - Publish specialized agents - Run them by CLI or SDK - Compose them to build higher level agents This is the end game for agents 🧠

10

11

212

James Grugett

@jahooma

3 months

We just made @CodebuffAI ~30% cheaper and ~9% faster with ~no decline in performance. (According to BuffBench 💪) All we did was fix some bugs 🪲 Enjoy!

2

18

Debsourya Datta

@debsourya005

3 months

This new open source coding agent is beating claude code like crazy. Everyone is saying the responses are OP🔥, gonna try this out. @CodebuffAI

0

1

6

James Grugett

@jahooma

3 months

Announcing our weekend competition to create and publish @CodebuffAI agents! Help us fill out our Agent Store before we launch it on Monday! For the best agents that you all publish we'll be offering $200 of credit prizes for first, second, third place, and Community choice.

2

3

26