PatronusAI Profile Banner
PatronusAI Profile
PatronusAI

@PatronusAI

Followers
2K
Following
622
Media
85
Statuses
351

powerful ai evaluation and optimization 🦄 sign up: https://t.co/VtSC9S9ueQ

Joined July 2023
Don't wanna be here? Send us removal request.
@PatronusAI
PatronusAI
2 months
1/ 🔥🔥 Big news: We’re launching Percival, the first AI agent that can evaluate and fix other AI agents!. 🤖 Percival is an evaluation agent that doesn’t just detect failures in agent traces — it can fix them. Percival outperformed SOTA LLMs by 2.9x on the TRAIL dataset,
15
37
122
@PatronusAI
PatronusAI
8 days
Meet @snigdhabanda our FDE Lead! 🎉. Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with
Tweet media one
0
0
0
@PatronusAI
PatronusAI
9 days
At @PatronusAI, we're excited to publish a new article on the best practices for Agentic Workflow. 🚀 . In this article, you will learn about agentic workflows, which involve specialized AI agents collaborating to solve complex problems without human intervention, and their.
0
1
6
@PatronusAI
PatronusAI
19 days
Using your best AI debugger just got easier -- spotlighting our Percival Integrations! 🛡️. Percival is a highly intelligent agent and AI debugger, capable of detecting 20+ failure modes in agentic traces and suggesting optimizations. Generally, these agentic systems run into
Tweet media one
1
0
2
@PatronusAI
PatronusAI
19 days
RT @MLCommons: Today, MLCommons is announcing a new collaboration with contributors from across academia, civil society, and industry to co….
0
1
0
@PatronusAI
PatronusAI
23 days
We’re up to exciting things here at Patronus AI, working at the forefront of AI optimization and evaluation! Recently, we launched Percival, a SOTA AI Agent debugger, and have previously released industry-standard benchmarks for agents like TRAIL and BLUR, as well as.
0
1
5
@PatronusAI
PatronusAI
25 days
Thank you, Professor @Zhou_Yu_AI and @bklsummithouse, for the AI Agents in Action: Industry × Academia Exchange!. @rebeccatqian, our CTO, was on a panel with Vinay Rao (Advisor at @AnthropicAI), @ShunyuYao12 (Research Scientist at @OpenAI), Robert Parker (Founder of Perceptix),
Tweet media one
Tweet media two
0
2
10
@PatronusAI
PatronusAI
1 month
Welcome Peng Wang to the team! 🎉 . Peng joins Patronus AI as Head of Applied Research. Previously, he was Head of Research at Grammarly, Head of AI at AlphaSense, and an ML Engineer at Google Research. Peng’s research interests include: LLM personalization and contextualization,
Tweet media one
0
2
7
@PatronusAI
PatronusAI
1 month
Spotlighting our newest open-source agent benchmark: TRAIL 🥳. Grounded in multi-step evals and real-world agentic issues, we created a novel taxonomy containing 20+ agentic errors spanning reasoning, planning, and system execution errors. Following this taxonomy, we build on
Tweet media one
0
1
6
@PatronusAI
PatronusAI
2 months
Today, we are proud to share our latest customer story: CARIAD of Volkswagen Group. @cariad_tech, @VW's software organization, leverages @PatronusAI's advanced evaluation tools to continuously enhance its in-vehicle AI assistants. When we first met the AI team at CARIAD, we
0
2
4
@PatronusAI
PatronusAI
2 months
Welcome Hersh Mehta to the team! 🎉 . Hersh joins us as a Staff Research Engineer. Previously, he led research engineering efforts across companies like @Meta, Cruise, and @Uber. This included zero-to-one AI/ML research projects in self-driving, virtual reality, and other
Tweet media one
0
1
5
@PatronusAI
PatronusAI
2 months
RT @clefourrier: To make sure your AI agent is not bullshitting you, you need to evaluate its reasoning. but to do so automatically, you….
0
8
0
@PatronusAI
PatronusAI
2 months
RT @CShorten30: What if you had an Agent tailor-made for debugging and improving your Agent? . I am SUPER EXCITED to publish our newest We….
0
16
0
@PatronusAI
PatronusAI
2 months
RT @clefourrier: Check out the very cool work from our friends @PatronusAI 🔥 work here!.
0
7
0
@PatronusAI
PatronusAI
2 months
7/ Percival is now live! We’re partnering with world-class teams working on AI agents — reach out to us if you have an interesting agent use case. ⚡. Try out Percival in Patronus: Docs: Case Studies: Paper:.
0
2
14
@PatronusAI
PatronusAI
2 months
6/ Percival isn’t just a LLM judge — it’s an evaluation agent that fixes your AI workflow. Integrate it into your dev loop to find broken traces, debug failures, and continuously fine-tune your agent system. 🔁. Percival integrates with a wide variety of agent frameworks and.
1
1
10
@PatronusAI
PatronusAI
2 months
5/ Percival is the first agent that consistently matches or exceeds human evaluators on trace error localization. We’ve seen up to 60x productivity increase in engineering teams integrating Percival into their development cycles. 🚀🚢. Read our case studies to see how Percival.
1
1
10
@PatronusAI
PatronusAI
2 months
4/ Percival outperforms every baseline on TRAIL by a wide margin in both error detection and localization accuracy, averaging 2.9x increase in joint accuracy. 🏆. Joint Accuracy on TRAIL (avg. GAIA + SWE splits):.- Percival: 0.163.- Gemini-2.5-Pro: 0.117.- OpenAI o3: 0.092.-
Tweet media one
1
1
11
@PatronusAI
PatronusAI
2 months
3/ How well does it work? To evaluate Percival, we created TRAIL (Trace Reasoning and Agentic Issue Localization) — the first trace analysis benchmark for dynamic multi-agent workflows. 🕵️🔍. TRAIL contains 148 long-context, real-world traces with detailed human annotations and a
Tweet media one
1
1
10
@PatronusAI
PatronusAI
2 months
2/ Agentic workflows are tricky to debug due to long contexts, noisy tool outputs and non-deterministic behaviors. Manually debugging agents is slow, brittle, and doesn’t scale. 🙅‍♂️. Percival is trained to catch 20+ failure modes in agentic systems — from incorrect tool use to
Tweet media one
1
3
13