Julia Neagu @JuliaANeagu X Profile

Julia Neagu

@JuliaANeagu

Followers

2K

Following

3K

Media

165

Statuses

2K

building @QuotientAI ✨ formerly @GitHub @GitHubCopilot 🤖 reformed physicist 👩‍🔬 ~ opinions are my own ~

https://t.co/dzNL3gyG8R

boston | (sf)

Joined March 2013

Don't wanna be here? Send us removal request.

Julia Neagu

@JuliaANeagu

18 days

We benchmarked how well open language models handle tool calls and found some clear patterns: - 1 in 6 calls use the wrong tool - 2–3% have parameter name mismatches - 1–2% pass values in the wrong format Most tool use issues come from unclear schemas, overlapping tool names, or

4

6

25

Julia Neagu

@JuliaANeagu

8 hours

AI products today are shipped on intuition, a sense of what feels right more than what’s been measured. Production is a whole other ballgame. Next week at @lisbonai_, I’ll be joining @cramforce, @noahgsolomon, and @elliotnorrevik to talk about what it really takes to ship fast

0

3

7

Gergely Orosz

@GergelyOrosz

2 days

I preferred using Perplexity because they seemed to have solved the problem of hallucination. Well, not anymore: Perplexity started to make up sources that do not exist. Hallucinations seem to be something that always stays with LLMs and LLM products... never fully trust LLMs

69

34

518

Julia Neagu

@JuliaANeagu

5 days

Watch our module: https://t.co/FU77wL8J45 Or take the full course:

qdrant.tech

Learn hybrid search, multivectors, and production deployment in 7 days. Build and ship a docs search engine.

0

Julia Neagu

@JuliaANeagu

5 days

Qdrant just launched an in-depth course on mastering production-grade vector search, and we’re part of it! We teamed up to show how to analyze and debug AI systems using @QuotientAI + @qdrant_engine. Our session covers real-world AI monitoring workflows w/ hallucination

1

6

Julia Neagu

@JuliaANeagu

8 days

Reserve your spot now: https://t.co/PkIblLFU7N

maven.com

AI agents fail most often through silent tool-call errors—wrong tools or bad parameters—not bad responses. Learning to detect and fix these failures is essential for building production-ready agents....

0

2

Julia Neagu

@JuliaANeagu

8 days

Most agent failures are silent: wrong tools, bad parameters, and subtle breakdowns in how agents use them. @jxnlco and myself break down how to trace, detect and fix tool-call errors in AI agents in our next @mavenhq course. We’ll go deep on evaluators like limbic-tool-use,

3

1

19

Julia Neagu

@JuliaANeagu

12 days

no-one likes 10x engineers? 🤯

Julia Neagu

@JuliaANeagu

13 days

You suspect you’re reading someone’s vibe code. You are now: A: personally offended because you didn’t get some handcrafted artisan goodness B: genuinely pleased you’re working with a 10x vibecoder engineer C: an AI Code Reviewer🤘

1

0

4

Julia Neagu

@JuliaANeagu

13 days

You suspect you’re reading someone’s vibe code. You are now: A: personally offended because you didn’t get some handcrafted artisan goodness B: genuinely pleased you’re working with a 10x vibecoder engineer C: an AI Code Reviewer🤘

0

3

Irina Nazarova

@inazarova

13 days

It was a fantastic experience for @evilmartians to build the open source agent observability library for brilliant @QuotientAI and for all products that help people build agentic workflows. More apps are gonna need this, and Quotient with Martians are leading the way!

Julia Neagu

@JuliaANeagu

13 days

Agentic traces contain the complete story of an agent’s process: how it plans, reasons, chooses tools, and reacts when things go wrong. But making sense of those traces is slow and frustrating without the right UX. We’re changing that. AgentPrism is now live in @QuotientAI: a

1

6

17

Julia Neagu

@JuliaANeagu

13 days

Was talking to a friend at a vibe-coded B2B SaaS startup and their codebase is 90% try/except blocks for every error they've ever seen. Code runs flawlessly by some metrics.

Julia Neagu

@JuliaANeagu

13 days

"try/except" is the em-dash of vibe code

1

0

11

Julia Neagu

@JuliaANeagu

13 days

it's probably necessary a lot of the time, but seeing it makes me suspicious

0

4

Julia Neagu

@JuliaANeagu

13 days

"try/except" is the em-dash of vibe code

4

53

Evil Martians

@evilmartians

13 days

Agentic traces contain perfect information about an agent’s behavior with every plan, action, and retry. But that information gets lost in a sea of JSON. So we built AgentPrism: open source React components that turn traces into visual diagrams for debugging AI agents. You can

15

83

806

Julia Neagu

@JuliaANeagu

13 days

AgentPrism is now live in @QuotientAI and on @github: - Start tracing your agent w/ Quotient and AgentPrism by @evilmartians: https://t.co/od6626fQDW - Add AgentPrism to your homegrown agents: https://t.co/I869qyOAS2

0

7

Julia Neagu

@JuliaANeagu

13 days

Agentic traces contain the complete story of an agent’s process: how it plans, reasons, chooses tools, and reacts when things go wrong. But making sense of those traces is slow and frustrating without the right UX. We’re changing that. AgentPrism is now live in @QuotientAI: a

3

9

29

Julia Neagu

@JuliaANeagu

16 days

it was the context all along? https://t.co/qmuyoWOKKM

Julia Neagu

@JuliaANeagu

1 month

Hallucinations are still an open problem in real-world AI systems. Not because models make things up, but because they fail at reasoning over the messy context they’re given: missing docs, conflicting snippets, noisy data. I pulled together what we know so far from research and

0

3

Julia Neagu

@JuliaANeagu

16 days

models are now great at reasoning over decent context and mediocre at reasoning over garbage context. it’s not the models’ fault believe it or not

Greg Brockman

@gdb

16 days

today's AI feels smart enough for most tasks of up to a few minutes in duration, and when it can't get the job done, it's often because it lacks sufficient background context for even a very capable human to succeed

2

1

7

Julia Neagu

@JuliaANeagu

18 days

We also benchmarked Qwen-235B and Kimi-K2 on @togethercompute. Common failure patterns: - Tool selection between overlapping functions - Parameter values like date="next Tuesday" instead of YYYY-MM-DD - Parameter names like query vs search_term

1

5

Julia Neagu

@JuliaANeagu

18 days

it’s also hard to sell third-party (not foundational, not oss, not custom) embedding models without enterprise extensive services. you have to go to an enterprise customer and show that you’re outperforming OAI embeddings on their particular domain, which is time consuming and

Jo Kristian Bergum

@jobergum

19 days

Jina finding a home in Elastic marks the third acquisition/pivot in this space just this year. Many of these «embedding model» companies have presented themselves as that, but in practice have been selling «enterprise search» and consulting for years.

1

0

5