freddie_v4 Profile Banner
Freddie Vargus Profile
Freddie Vargus

@freddie_v4

Followers
1K
Following
11K
Media
42
Statuses
721

cto & co-founder @quotientai Research @cohere_labs — past: evals @github Copilot, data @quantopian — Tico 🇨🇷🇺🇸

Boston
Joined June 2012
Don't wanna be here? Send us removal request.
@freddie_v4
Freddie Vargus
10 days
introducing the Quotient MCP Server, our entrypoint for in-the-loop steering of agents. agents can receive information about what kinds of errors they're making in between steps, get feedback from specialized models, and correct themselves
3
9
19
@freddie_v4
Freddie Vargus
3 days
simple prompt still. no adjustments on thinking here. probs would be good to have error bars.
0
0
0
@grok
Grok
8 hours
Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.
16
14
145
@freddie_v4
Freddie Vargus
3 days
will eventually update this bench with more mons & more models, but truly the models need some work on pokeshadowbench. added GPT5 series + Claude 4.1
Tweet media one
@freddie_v4
Freddie Vargus
4 days
time to update the bench.
1
1
4
@freddie_v4
Freddie Vargus
4 days
time to update the bench.
@freddie_v4
Freddie Vargus
3 months
Quick weekend project: how good are LLM's at "Who's That Pokémon?" . answer: not great!. I tested some of the best models on a simple game segment from the show with a small benchmark I call PokeShadowBench. some results below
Tweet media one
1
0
2
@freddie_v4
Freddie Vargus
10 days
our MCP server is just one component in Limbic, our system which captures and processes agent behavior, helps you understand it, and automatically improves your agents for you. reach out to me or @JuliaANeagu if this is something you're interested in 🙂 quotientai dot co.
0
0
1
@freddie_v4
Freddie Vargus
10 days
we have guides for integrating with @cursor_ai @claudeai Code, @AmpCode , and @claudeai Desktop. you can find docs here
1
1
2
@freddie_v4
Freddie Vargus
10 days
our MCP server currently provides a tool for evaluating tool calls, backed by our limbic-tool-use model. more tools are in the works!.
@freddie_v4
Freddie Vargus
20 days
today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools. it's great at picking up on tool accuracy issues and outperforms larger models
1
0
3
@freddie_v4
Freddie Vargus
11 days
RT @jxnlco: notes from @JuliaANeagu 's talk on hallicnations
Tweet media one
0
4
0
@freddie_v4
Freddie Vargus
11 days
RT @SunejaLuv: Tool use hallucinations are real and often ignored. When an agent:. - Invents a function name. - Uses incorrect parameters….
0
2
0
@freddie_v4
Freddie Vargus
13 days
I’ve been moving more and more of my coding off of Cursor and on to Sculptor btw. the vibes are good, and the experience has been pretty nice.
@imbue_ai
Imbue
13 days
Writing code is just the start. To move beyond prototypes, we need agents that plan, write specs, run tests, follow style guides, and catch bugs before you do. @JoshAlbrecht shared how we're tackling this with Sculptor at @aiDotEngineer World’s Fair:
1
2
15
@freddie_v4
Freddie Vargus
13 days
RT @JuliaANeagu: Our talk with @Tavily is now live — part of the new.@aiDotEngineer Retrieval & Search track. We share a practical frame….
0
7
0
@freddie_v4
Freddie Vargus
13 days
RT @JuliaANeagu: 2⃣ years ago, I convinced @freddie_v4 to take the plunge and start @QuotientAI with me. two years into the crazy ride, we'….
0
2
0
@freddie_v4
Freddie Vargus
13 days
RT @ToolUseAI: 🔥The best AI advice for 2025🔥. Two dozen of the top minds in the AI space share their top advice and lessons learned from th….
0
8
0
@freddie_v4
Freddie Vargus
16 days
RT @JnBrymn: I'm doing AI research - comparing it to how humans think. Think quick, simulate a coin flip in your head. What is the result,….
0
1
0
@freddie_v4
Freddie Vargus
17 days
RT @code_star: the dawg abides
Tweet media one
0
1
0
@freddie_v4
Freddie Vargus
17 days
@code_star can i get a vibe check lord Data Dawg?.
1
0
3
@freddie_v4
Freddie Vargus
17 days
there's a lot of other ideas we had along the way, and more we're going to do (additional announcements next week) but if you want to chat more about this message us freddie or julia at quotientai dot co or on Discord
blog.quotientai.co
Despite widespread adoption of tool use, there has been no dedicated model for evaluating tool-use accuracy—until now.
0
0
2
@freddie_v4
Freddie Vargus
17 days
we assembled everything back into one dataset (train/val/test, etc), and ended up fine-tuning using @UnslothAI and a single GPU on @modal_labs for 0.5B, 3B, and 7B models. if you want to try out the model, you can find it on hugging face here
Tweet card summary image
huggingface.co
1
0
3
@freddie_v4
Freddie Vargus
17 days
then to add some more robustness, we injected errors to simulate different kinds of mistakes -- wrong tool calls, hallucinated tools, missing parameters, incorrect types, invalid structures and malformed values. this helped us develop different kinds of reasoning for failures.
1
0
3