CodebuffAI Profile Banner
Codebuff Profile
Codebuff

@CodebuffAI

Followers
2K
Following
1K
Media
11
Statuses
281

Make your terminal write code for you: npm i -g codebuff

Your terminal
Joined October 2024
Don't wanna be here? Send us removal request.
@jahooma
James Grugett
16 hours
We run multiple models in parallel and have a selector agent choose the best code output. However, Opus has a strong bias toward choosing code generated by Opus. So we stopped including other models, because Opus is racist 😂
@jahooma
James Grugett
16 hours
In the next iteration of @CodebuffAI's best-of-n: - Selector agent sees the alternative implementations by git patch lines added/removed instead of str_replace tool calls - Selector agent picks the best, but passes on suggested improvements based on the other impls
1
1
5
@jahooma
James Grugett
5 days
Our old coding agent evals ("BuffBench") are too easy for Opus 4.5. So I'm generating new ones. We do this entirely autonomously: - An agent looks through open source commits - Picks all the big beefy ones requiring skill to replicate - For each commit, generates a high-level
2
1
21
@jahooma
James Grugett
6 days
Absolutely beautiful when an agent wraps up its task by: - spawning 4 (!) parallel agents - runs typechecks and tests - also does a mini-code review And, the code reviewer spots a bug, and the orchestrator fixes it! And all that took like 7 seconds.
11
14
251
@jahooma
James Grugett
19 days
We're testing which models to spawn for our best-of-n editing: On 62 tasks, Opus never chose Gemini's edits. It always chose edits by Opus or GPT-5 as the best. In contrast, for our normal mode using Sonnet and Gemini, Gemini's edits were chosen the most by Sonnet.
1
2
11
@jahooma
James Grugett
22 days
It's a good time to retry @CodebuffAI 🔥
4
1
10
@jahooma
James Grugett
22 days
@desugar_64 @CodebuffAI @opencode @charmcli Codebuff crushes (pun intended!) these alternate coding agents on almost any eval. We're geared it to be as performant as possible. 1. Best code output 2. Faster 3. Comes with a composable agent framework, so you can create subagents for Codebuff to use
1
1
1
@jahooma
James Grugett
23 days
We've integrated Gemini 3 into Codebuff! It now appears as one of the best-of-n editors, and it has taken over code review duty! Gemini 3 is very smart, but because it thinks before each step, it's slower at gathering context, and separately, doesn't choose to read as many
9
4
68
@jahooma
James Grugett
23 days
The new Codebuff is truly cooking. We're relaunching soon, with the most performant coding agent! npm i -g codebuff
19
7
122
@camelroxo1
Camel Roxo
1 month
wtf @CodebuffAI is crazy good bro, what did these guys do to be so good? lol, it had been a long time since I'd felt so impressed with the capabilities of an AI tool. even though the price is a bit steep for me being poor, it's worth it.
0
1
2
@CodebuffAI
Codebuff
1 month
💪
@aicodeking
AICodeKing
2 months
MiniMax M2 + Claude Code on KingBench Agentic Evaluations: It now scores #2 on my Agentic Evaluations beating GLM-4.6 by a wide margin. It seems to work much better with Claude Code's Tools. Really great model and it's my daily driver now. I haven't tested GLM with CC yet.
2
2
11
@jahooma
James Grugett
2 months
Funny enough, we built a benchmark with an agent doing back-and-forth prompting for a coding agent. But I'm rewriting it as we speak it to remove this prompting agent haha Main reason is efficiency: we're on a budget so we need the most signal per dollar spent on evals (we
@thdxr
dax
2 months
one challenging thing with llm benchmarks is they're single prompt in real life you go back and forth with LLM, guide it, work together but this is hard to turn into an automated benchmark
3
3
16
@jahooma
James Grugett
2 months
Try `codebuff --experimental` for our cutting-edge agent!
@aeitroc
Bessi
2 months
@jahooma @brandonkachen codebuff --experimental: the agents have been coding for 30min a pretty difficult multi-layer plan. I'm mindblown from the details presented by the agents while are implementing it....and this is not even the final version of codebuff.
1
1
9
@jahooma
James Grugett
2 months
Finally got the "Orchestrator" pattern to produce higher results on our evals! The world doesn't know what's coming...
@jahooma
James Grugett
2 months
Most YC startups solve an immediate business need and climb a gradient based on customer feedback. We're not really doing that. Instead, we're building our vision of the future. It's a much worse strategy -- most of the time. Occasionally, it really works.
6
3
35
@jahooma
James Grugett
3 months
Sonnet 4.5 is live in @CodebuffAI! - It's much more concise - It's good at doing multi-file edits in a single response, making it faster & cheaper - Early eval results show a bump -- more on this soon!
0
1
16
@jahooma
James Grugett
3 months
I just searched "codebuff". And... Claude Code is paying for the first result? Wow.
12
2
57
@dexhorthy
dex
3 months
kinda unhinged that none of the agent frameworks that were hot a year ago are used in any of the top coding agents and now @opencode @CodebuffAI @claude_code are all adding agent-framework-y things... at sprout we used to call this "refactor AFTER patterns emerge". its a good
23
15
244
@jahooma
James Grugett
3 months
Announcing the Agent Store! It's npm for agents: - Publish specialized agents - Run them by CLI or SDK - Compose them to build higher level agents This is the end game for agents 🧠
10
11
212
@jahooma
James Grugett
3 months
We just made @CodebuffAI ~30% cheaper and ~9% faster with ~no decline in performance. (According to BuffBench 💪) All we did was fix some bugs 🪲 Enjoy!
2
2
18
@debsourya005
Debsourya Datta
3 months
This new open source coding agent is beating claude code like crazy. Everyone is saying the responses are OP🔥, gonna try this out. @CodebuffAI
0
1
6
@jahooma
James Grugett
3 months
Announcing our weekend competition to create and publish @CodebuffAI agents! Help us fill out our Agent Store before we launch it on Monday! For the best agents that you all publish we'll be offering $200 of credit prizes for first, second, third place, and Community choice.
2
3
26