freddie_v4 Profile Banner
Freddie Vargus Profile
Freddie Vargus

@freddie_v4

Followers
1K
Following
11K
Media
38
Statuses
669

cto & co-founder @quotientai Research @cohere_labs — past: evals @github Copilot, data @quantopian — Tico 🇨🇷🇺🇸

Boston
Joined June 2012
Don't wanna be here? Send us removal request.
@freddie_v4
Freddie Vargus
2 months
Quick weekend project: how good are LLM's at "Who's That Pokémon?" . answer: not great!. I tested some of the best models on a simple game segment from the show with a small benchmark I call PokeShadowBench. some results below
Tweet media one
3
2
33
@freddie_v4
Freddie Vargus
1 day
didn't expect this to get as much attention as it did! . we're working on more exciting things so if you're looking to understand problems with your agents or want to make them better, message me or @JuliaANeagu and let us know 🙂 also try Quotient.
@freddie_v4
Freddie Vargus
1 day
today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools. it's great at picking up on tool accuracy issues and outperforms larger models
1
1
9
@freddie_v4
Freddie Vargus
1 day
blog post later this week more details! if you're having problems with tool use in your agents, what are your biggest issues right now? where do you see things go wrong?. let me know how limbic can help or what's missing.
1
0
11
@freddie_v4
Freddie Vargus
1 day
this is part of our new system we're building called limbic, which captures and processes agent behavior, helps you understand it, and automatically improves your agents for you. we'll be adding a more accurate model to Detections (our async processing component) soon.
2
1
17
@freddie_v4
Freddie Vargus
1 day
if you want to try the model out, check out the huggingface link. alternatively, you can send requests to a deployed version up on @modal_labs. gist here: trained with @unslothai btw
Tweet media one
1
1
20
@freddie_v4
Freddie Vargus
1 day
of course, this wouldn't be complete without some eval dataset, so we're also releasing the test set on huggingface with info on our training data curation pipeline.
Tweet card summary image
huggingface.co
1
0
19
@freddie_v4
Freddie Vargus
1 day
it's open weights and available on @huggingface .
Tweet card summary image
huggingface.co
1
0
38
@freddie_v4
Freddie Vargus
1 day
today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools. it's great at picking up on tool accuracy issues and outperforms larger models
13
89
857
@freddie_v4
Freddie Vargus
5 days
RT @JuliaANeagu: Just dropped: three new cookbooks for building AI research agents with @ExaAILabs, @LangChainAI, @OpenAI, and @AnthropicA….
0
6
0
@freddie_v4
Freddie Vargus
7 days
this book is actually a very good read about context engineering. chapter 5 and 6 is all about building context and chapter 8 is all about building sources for context.
@JnBrymn
John Berryman
7 days
In light of all the attention that context engineering is getting, today I proudly introduce the second book that Albert Ziegler and I have written together: Context Engineering for LLM Applications.
Tweet media one
0
1
4
@freddie_v4
Freddie Vargus
7 days
RT @JnBrymn: In light of all the attention that context engineering is getting, today I proudly introduce the second book that Albert Ziegl….
0
6
0
@freddie_v4
Freddie Vargus
12 days
RT @JuliaANeagu: If you're shipping LLMs to production and still finding out about critical from your users, this course is for you. Real-….
0
3
0
@freddie_v4
Freddie Vargus
14 days
RT @code_star: Should I change my bio?.
0
2
0
@freddie_v4
Freddie Vargus
27 days
still meeting plenty of people who haven’t even heard of ChatGPT.
@JuliaANeagu
Julia Neagu
28 days
@HanchungLee you have to remember we’re in a bubble within a bubble within a bubble right now and some of my engineer friends in Europe discovered Cursor last week.
0
0
2
@freddie_v4
Freddie Vargus
28 days
John is a great person to chat with and learn from — check this out 👀!.
@JnBrymn
John Berryman
28 days
I'm super excited to present to @7ctos.tomorrow on "Wresting with AI Evaluations" . Generative AI has made it easier than ever for companies to build products quickly. However, LLMs are inherently nondeterministic and unpredictable. Integrating LLMs into products demands an.
0
0
4
@freddie_v4
Freddie Vargus
28 days
if you’re at @vercel Ship ask Hai about . have used it a bunch for understanding new and trending github repos it is nice 👌🏼.
@haithehuman
Hai The Dude
28 days
Tweet media one
0
0
5
@freddie_v4
Freddie Vargus
1 month
RT @JuliaANeagu: “You want your model hitting milestones, not minefields.”. Most AI eval talk is hand-wavy. This isn’t. @freddie_v4 (@Quot….
0
5
0
@freddie_v4
Freddie Vargus
1 month
it’s good everyone here can share the misery of the internet being broken. hopefully this doesn’t jinx it.
0
0
1
@freddie_v4
Freddie Vargus
1 month
so who’s the bad node in the dependency graph right now? . supabase, aws, gcp, cursor, cloudflare, azure… surely something upstream?.
2
0
3