
Freddie Vargus
@freddie_v4
Followers
1K
Following
11K
Media
38
Statuses
669
cto & co-founder @quotientai Research @cohere_labs — past: evals @github Copilot, data @quantopian — Tico 🇨🇷🇺🇸
Boston
Joined June 2012
Quick weekend project: how good are LLM's at "Who's That Pokémon?" . answer: not great!. I tested some of the best models on a simple game segment from the show with a small benchmark I call PokeShadowBench. some results below
3
2
33
didn't expect this to get as much attention as it did! . we're working on more exciting things so if you're looking to understand problems with your agents or want to make them better, message me or @JuliaANeagu and let us know 🙂 also try Quotient.
today we're releasing a new small model (0.5B) for detecting problems with tool usage in agents, trained on 50M tokens from publicly available MCP server tools. it's great at picking up on tool accuracy issues and outperforms larger models
1
1
9
if you want to try the model out, check out the huggingface link. alternatively, you can send requests to a deployed version up on @modal_labs. gist here: trained with @unslothai btw
1
1
20
of course, this wouldn't be complete without some eval dataset, so we're also releasing the test set on huggingface with info on our training data curation pipeline.
huggingface.co
1
0
19
RT @jxnlco: how you can catch hallucniations in production with @QuotientAI . sign up for study notes and recordings afterwards even if you….
maven.com
Production AI systems fail silently—hallucinating facts, missing key information, or breaking entirely. Without real-time monitoring, these failures reach users and damage trust. This lesson teaches...
0
2
0
RT @JuliaANeagu: Just dropped: three new cookbooks for building AI research agents with @ExaAILabs, @LangChainAI, @OpenAI, and @AnthropicA….
0
6
0
this book is actually a very good read about context engineering. chapter 5 and 6 is all about building context and chapter 8 is all about building sources for context.
In light of all the attention that context engineering is getting, today I proudly introduce the second book that Albert Ziegler and I have written together: Context Engineering for LLM Applications.
0
1
4
RT @JnBrymn: In light of all the attention that context engineering is getting, today I proudly introduce the second book that Albert Ziegl….
0
6
0
RT @JuliaANeagu: If you're shipping LLMs to production and still finding out about critical from your users, this course is for you. Real-….
0
3
0
still meeting plenty of people who haven’t even heard of ChatGPT.
@HanchungLee you have to remember we’re in a bubble within a bubble within a bubble right now and some of my engineer friends in Europe discovered Cursor last week.
0
0
2
John is a great person to chat with and learn from — check this out 👀!.
I'm super excited to present to @7ctos.tomorrow on "Wresting with AI Evaluations" . Generative AI has made it easier than ever for companies to build products quickly. However, LLMs are inherently nondeterministic and unpredictable. Integrating LLMs into products demands an.
0
0
4
if you’re at @vercel Ship ask Hai about . have used it a bunch for understanding new and trending github repos it is nice 👌🏼.
0
0
5
RT @JuliaANeagu: “You want your model hitting milestones, not minefields.”. Most AI eval talk is hand-wavy. This isn’t. @freddie_v4 (@Quot….
0
5
0