Julia Neagu
@JuliaANeagu
Followers
2K
Following
3K
Media
165
Statuses
2K
building @QuotientAI ✨ formerly @GitHub @GitHubCopilot 🤖 reformed physicist 👩🔬 ~ opinions are my own ~
boston | (sf)
Joined March 2013
We benchmarked how well open language models handle tool calls and found some clear patterns: - 1 in 6 calls use the wrong tool - 2–3% have parameter name mismatches - 1–2% pass values in the wrong format Most tool use issues come from unclear schemas, overlapping tool names, or
4
6
25
AI products today are shipped on intuition, a sense of what feels right more than what’s been measured. Production is a whole other ballgame. Next week at @lisbonai_, I’ll be joining @cramforce, @noahgsolomon, and @elliotnorrevik to talk about what it really takes to ship fast
0
3
7
I preferred using Perplexity because they seemed to have solved the problem of hallucination. Well, not anymore: Perplexity started to make up sources that do not exist. Hallucinations seem to be something that always stays with LLMs and LLM products... never fully trust LLMs
69
34
518
Watch our module: https://t.co/FU77wL8J45 Or take the full course:
qdrant.tech
Learn hybrid search, multivectors, and production deployment in 7 days. Build and ship a docs search engine.
0
0
0
Qdrant just launched an in-depth course on mastering production-grade vector search, and we’re part of it! We teamed up to show how to analyze and debug AI systems using @QuotientAI + @qdrant_engine. Our session covers real-world AI monitoring workflows w/ hallucination
1
1
6
You suspect you’re reading someone’s vibe code. You are now: A: personally offended because you didn’t get some handcrafted artisan goodness B: genuinely pleased you’re working with a 10x vibecoder engineer C: an AI Code Reviewer🤘
0
0
3
It was a fantastic experience for @evilmartians to build the open source agent observability library for brilliant @QuotientAI and for all products that help people build agentic workflows. More apps are gonna need this, and Quotient with Martians are leading the way!
Agentic traces contain the complete story of an agent’s process: how it plans, reasons, chooses tools, and reacts when things go wrong. But making sense of those traces is slow and frustrating without the right UX. We’re changing that. AgentPrism is now live in @QuotientAI: a
1
6
17
it's probably necessary a lot of the time, but seeing it makes me suspicious
0
0
4
Agentic traces contain perfect information about an agent’s behavior with every plan, action, and retry. But that information gets lost in a sea of JSON. So we built AgentPrism: open source React components that turn traces into visual diagrams for debugging AI agents. You can
15
83
806
AgentPrism is now live in @QuotientAI and on @github: - Start tracing your agent w/ Quotient and AgentPrism by @evilmartians: https://t.co/od6626fQDW - Add AgentPrism to your homegrown agents: https://t.co/I869qyOAS2
0
0
7
Agentic traces contain the complete story of an agent’s process: how it plans, reasons, chooses tools, and reacts when things go wrong. But making sense of those traces is slow and frustrating without the right UX. We’re changing that. AgentPrism is now live in @QuotientAI: a
3
9
29
it was the context all along? https://t.co/qmuyoWOKKM
Hallucinations are still an open problem in real-world AI systems. Not because models make things up, but because they fail at reasoning over the messy context they’re given: missing docs, conflicting snippets, noisy data. I pulled together what we know so far from research and
0
0
3
models are now great at reasoning over decent context and mediocre at reasoning over garbage context. it’s not the models’ fault believe it or not
today's AI feels smart enough for most tasks of up to a few minutes in duration, and when it can't get the job done, it's often because it lacks sufficient background context for even a very capable human to succeed
2
1
7
We also benchmarked Qwen-235B and Kimi-K2 on @togethercompute. Common failure patterns: - Tool selection between overlapping functions - Parameter values like date="next Tuesday" instead of YYYY-MM-DD - Parameter names like query vs search_term
1
1
5
it’s also hard to sell third-party (not foundational, not oss, not custom) embedding models without enterprise extensive services. you have to go to an enterprise customer and show that you’re outperforming OAI embeddings on their particular domain, which is time consuming and
Jina finding a home in Elastic marks the third acquisition/pivot in this space just this year. Many of these «embedding model» companies have presented themselves as that, but in practice have been selling «enterprise search» and consulting for years.
1
0
5