
PatronusAI
@PatronusAI
Followers
2K
Following
630
Media
94
Statuses
365
powerful ai evaluation and optimization 🦄 sign up: https://t.co/VtSC9S9ueQ
Joined July 2023
1/ 🔥🔥 Big news: We’re launching Percival, the first AI agent that can evaluate and fix other AI agents! 🤖 Percival is an evaluation agent that doesn’t just detect failures in agent traces — it can fix them. Percival outperformed SOTA LLMs by 2.9x on the TRAIL dataset,
15
37
125
At @PatronusAI, we're excited to publish a new article on the best practices for LLM Observability. 🚀 In this article, you will learn how LLM observability empowers engineering teams by capturing and analyzing various aspects of LLM-based applications—like prompts, responses,
0
1
3
Welcome Josh Weimer to the team! 🎉 Josh joins @PatronusAI as a Forward Deployed Engineer. Previously, he worked in the GovTech space, where he supported agencies including the Department of Defense, Department of Justice, Office of Personnel Management, and Department of the
0
0
1
Evaluators are at the heart of the Patronus AI platform, and clients across industries have found them helpful in evaluating context and answer relevance, detecting hallucinations, and analyzing multimodal content! If you’re new to evaluators, this blog post will give a quick
0
1
4
This past weekend, our Head of Applied Research, Peng Wang, was invited to give a talk on AI Oversight at Scale: Navigating the Challenges of Evaluating LLMs and Agents. Peng shared our vision of scalable AI oversight being the biggest problem facing widespread societal
0
0
1
Introducing Prompt Tester on the Patronus AI platform! Prompt Tester allows you to build more robust prompts across different types of contexts to test out their effectiveness. Prompt. Test. Evaluate. Iterate. Read more here: https://t.co/igwYasCSVn Give it a try and let us
0
1
4
At @PatronusAI, we're excited to publish a new article on the best practices for custom optimization tools for LLMs. 🚀 In this article, you will learn how large language models (LLMs) are being integrated into application development, with an overview of the tools and
0
1
1
Thank you, @BerkeleyRDI, for hosting the Agentic AI Summit and having us! @getdarshan, one of our research scientists, who leads agent evaluation here at Patronus, presented at the summit! Here are a few takeaways: * Given context explosion and increasing domain depth and
0
1
2
Unleash the Power of AI Oversight with @PatronusAI x @databricks 🎉 With the Patronus AI integration into Databricks MLFlow, you can now trace and transport (via OTel) your logs to the Patronus AI platform backend for detailed analysis. You’ll receive real-time monitoring,
0
0
0
Introducing Prompt Management on the @PatronusAI platform! Prompting is an essential part of the development and evaluation process, allowing for necessary human-in-the-loop exchanges and improvements. However, managing prompts is challenging and messy, with even the smallest
0
0
4
Building reliable agents requires a different tech stack: one that natively supports compound AI systems and evaluates quality along the full trajectory of agent behavior. We teamed up with @PatronusAI to break down what this stack looks like, from infra and models to debuggers.
2
3
17
At @PatronusAI, we're excited to publish a new article on the best practices for using AI agent platforms. 🚀 In this article, you will learn about various AI agent platforms like @n8n_io, @make_hq, @LangChainAI, @crewAIInc, and @huggingface smolagents. The article provides
0
0
4
Meet @snigdhabanda our FDE Lead! 🎉 Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with
0
0
0
At @PatronusAI, we're excited to publish a new article on the best practices for Agentic Workflow. 🚀 In this article, you will learn about agentic workflows, which involve specialized AI agents collaborating to solve complex problems without human intervention, and their
0
1
6
Using your best AI debugger just got easier -- spotlighting our Percival Integrations! 🛡️ Percival is a highly intelligent agent and AI debugger, capable of detecting 20+ failure modes in agentic traces and suggesting optimizations. Generally, these agentic systems run into
1
0
2
Today, MLCommons is announcing a new collaboration with contributors from across academia, civil society, and industry to co-develop an open agent reliability evaluation standard to operationalize trust in agentic deployments. 1/3 đź”— https://t.co/O04ZCox5yg
1
1
9
We’re up to exciting things here at Patronus AI, working at the forefront of AI optimization and evaluation! Recently, we launched Percival, a SOTA AI Agent debugger, and have previously released industry-standard benchmarks for agents like TRAIL and BLUR, as well as
0
1
5
Thank you, Professor @Zhou_Yu_AI and @bklsummithouse, for the AI Agents in Action: Industry Ă— Academia Exchange! @rebeccatqian, our CTO, was on a panel with Vinay Rao (Advisor at @AnthropicAI), @ShunyuYao12 (Research Scientist at @OpenAI), Robert Parker (Founder of Perceptix),
0
2
10