Freddie Vargus @freddie_v4 X Profile

Freddie Vargus

@freddie_v4

Followers

1K

Following

11K

Media

42

Statuses

721

cto & co-founder @quotientai Research @cohere_labs — past: evals @github Copilot, data @quantopian — Tico 🇨🇷🇺🇸

Boston

Joined June 2012

Don't wanna be here? Send us removal request.

Freddie Vargus

@freddie_v4

10 days

introducing the Quotient MCP Server, our entrypoint for in-the-loop steering of agents. agents can receive information about what kinds of errors they're making in between steps, get feedback from specialized models, and correct themselves

3

9

19

Freddie Vargus

@freddie_v4

3 days

simple prompt still. no adjustments on thinking here. probs would be good to have error bars.

0

Grok

@grok

8 hours

Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.

16

14

145

Freddie Vargus

@freddie_v4

3 days

will eventually update this bench with more mons & more models, but truly the models need some work on pokeshadowbench. added GPT5 series + Claude 4.1

Freddie Vargus

@freddie_v4

4 days

time to update the bench.

1

4

Freddie Vargus

@freddie_v4

4 days

time to update the bench.

Freddie Vargus

@freddie_v4

3 months

Quick weekend project: how good are LLM's at "Who's That Pokémon?" . answer: not great!. I tested some of the best models on a simple game segment from the show with a small benchmark I call PokeShadowBench. some results below

1

0

2

Freddie Vargus

@freddie_v4

10 days

our MCP server is just one component in Limbic, our system which captures and processes agent behavior, helps you understand it, and automatically improves your agents for you. reach out to me or @JuliaANeagu if this is something you're interested in 🙂 quotientai dot co.

0

1

Freddie Vargus

@freddie_v4

10 days

and the server is open source and can be found here

github.com

A Model Context Protocol (MCP) server for evaluating tool calls and AI agent interactions. - quotient-ai/quotient-mcp

1

0

2

Freddie Vargus

@freddie_v4

10 days

we have guides for integrating with @cursor_ai @claudeai Code, @AmpCode , and @claudeai Desktop. you can find docs here

1

2

Freddie Vargus

@freddie_v4

10 days

our MCP server currently provides a tool for evaluating tool calls, backed by our limbic-tool-use model. more tools are in the works!.

Freddie Vargus

@freddie_v4

20 days

today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools. it's great at picking up on tool accuracy issues and outperforms larger models

1

0

3

Freddie Vargus

@freddie_v4

11 days

RT @jxnlco: notes from @JuliaANeagu 's talk on hallicnations

0

4

0

Freddie Vargus

@freddie_v4

11 days

RT @SunejaLuv: Tool use hallucinations are real and often ignored. When an agent:. - Invents a function name. - Uses incorrect parameters….

0

2

0

Freddie Vargus

@freddie_v4

13 days

I’ve been moving more and more of my coding off of Cursor and on to Sculptor btw. the vibes are good, and the experience has been pretty nice.

Imbue

@imbue_ai

13 days

Writing code is just the start. To move beyond prototypes, we need agents that plan, write specs, run tests, follow style guides, and catch bugs before you do. @JoshAlbrecht shared how we're tackling this with Sculptor at @aiDotEngineer World’s Fair:

1

2

15

Freddie Vargus

@freddie_v4

13 days

RT @JuliaANeagu: Our talk with @Tavily is now live — part of the new.@aiDotEngineer Retrieval & Search track. We share a practical frame….

0

7

0

Freddie Vargus

@freddie_v4

13 days

RT @JuliaANeagu: 2⃣ years ago, I convinced @freddie_v4 to take the plunge and start @QuotientAI with me. two years into the crazy ride, we'….

0

2

0

Freddie Vargus

@freddie_v4

13 days

RT @ToolUseAI: 🔥The best AI advice for 2025🔥. Two dozen of the top minds in the AI space share their top advice and lessons learned from th….

0

8

0

Freddie Vargus

@freddie_v4

16 days

RT @JnBrymn: I'm doing AI research - comparing it to how humans think. Think quick, simulate a coin flip in your head. What is the result,….

0

1

0

Freddie Vargus

@freddie_v4

17 days

RT @code_star: the dawg abides

0

1

0

Freddie Vargus

@freddie_v4

17 days

@code_star can i get a vibe check lord Data Dawg?.

1

0

3

Freddie Vargus

@freddie_v4

17 days

there's a lot of other ideas we had along the way, and more we're going to do (additional announcements next week) but if you want to chat more about this message us freddie or julia at quotientai dot co or on Discord

blog.quotientai.co

Despite widespread adoption of tool use, there has been no dedicated model for evaluating tool-use accuracy—until now.

0

2

Freddie Vargus

@freddie_v4

17 days

we assembled everything back into one dataset (train/val/test, etc), and ended up fine-tuning using @UnslothAI and a single GPU on @modal_labs for 0.5B, 3B, and 7B models. if you want to try out the model, you can find it on hugging face here

huggingface.co

1

0

3

Freddie Vargus

@freddie_v4

17 days

then to add some more robustness, we injected errors to simulate different kinds of mistakes -- wrong tool calls, hallucinated tools, missing parameters, incorrect types, invalid structures and malformed values. this helped us develop different kinds of reasoning for failures.

1

0

3