PatronusAI Profile Banner
PatronusAI Profile
PatronusAI

@PatronusAI

Followers
2K
Following
630
Media
94
Statuses
365

powerful ai evaluation and optimization 🦄 sign up: https://t.co/VtSC9S9ueQ

Joined July 2023
Don't wanna be here? Send us removal request.
@PatronusAI
PatronusAI
4 months
1/ 🔥🔥 Big news: We’re launching Percival, the first AI agent that can evaluate and fix other AI agents! 🤖 Percival is an evaluation agent that doesn’t just detect failures in agent traces — it can fix them. Percival outperformed SOTA LLMs by 2.9x on the TRAIL dataset,
15
37
125
@PatronusAI
PatronusAI
13 days
At @PatronusAI, we're excited to publish a new article on the best practices for LLM Observability. 🚀 In this article, you will learn how LLM observability empowers engineering teams by capturing and analyzing various aspects of LLM-based applications—like prompts, responses,
0
1
3
@PatronusAI
PatronusAI
19 days
Welcome Josh Weimer to the team! 🎉 Josh joins @PatronusAI as a Forward Deployed Engineer. Previously, he worked in the GovTech space, where he supported agencies including the Department of Defense, Department of Justice, Office of Personnel Management, and Department of the
Tweet media one
0
0
1
@PatronusAI
PatronusAI
20 days
Evaluators are at the heart of the Patronus AI platform, and clients across industries have found them helpful in evaluating context and answer relevance, detecting hallucinations, and analyzing multimodal content! If you’re new to evaluators, this blog post will give a quick
Tweet media one
0
1
4
@PatronusAI
PatronusAI
20 days
This past weekend, our Head of Applied Research, Peng Wang, was invited to give a talk on AI Oversight at Scale: Navigating the Challenges of Evaluating LLMs and Agents. Peng shared our vision of scalable AI oversight being the biggest problem facing widespread societal
Tweet media one
0
0
1
@PatronusAI
PatronusAI
25 days
Introducing Prompt Tester on the Patronus AI platform! Prompt Tester allows you to build more robust prompts across different types of contexts to test out their effectiveness. Prompt. Test. Evaluate. Iterate. Read more here: https://t.co/igwYasCSVn Give it a try and let us
Tweet media one
0
1
4
@PatronusAI
PatronusAI
27 days
At @PatronusAI, we're excited to publish a new article on the best practices for custom optimization tools for LLMs. 🚀 In this article, you will learn how large language models (LLMs) are being integrated into application development, with an overview of the tools and
0
1
1
@PatronusAI
PatronusAI
1 month
Our team has been collaborating with @Etsy(for a while now) on exciting multimodal evaluations! Last week, we had the opportunity to synthesize learnings from our suite of projects when @varjoshi, our Head of Engineering, presented at the Etsy ML Summit! Thank you for having
Tweet media one
0
1
6
@PatronusAI
PatronusAI
1 month
Thank you, @BerkeleyRDI, for hosting the Agentic AI Summit and having us! @getdarshan, one of our research scientists, who leads agent evaluation here at Patronus, presented at the summit! Here are a few takeaways: * Given context explosion and increasing domain depth and
Tweet media one
0
1
2
@PatronusAI
PatronusAI
1 month
Unleash the Power of AI Oversight with @PatronusAI x @databricks 🎉 With the Patronus AI integration into Databricks MLFlow, you can now trace and transport (via OTel) your logs to the Patronus AI platform backend for detailed analysis. You’ll receive real-time monitoring,
Tweet media one
0
0
0
@PatronusAI
PatronusAI
1 month
Introducing Prompt Management on the @PatronusAI platform! Prompting is an essential part of the development and evaluation process, allowing for necessary human-in-the-loop exchanges and improvements. However, managing prompts is challenging and messy, with even the smallest
Tweet media one
0
0
4
@basetenco
Baseten
1 month
Building reliable agents requires a different tech stack: one that natively supports compound AI systems and evaluates quality along the full trajectory of agent behavior. We teamed up with @PatronusAI to break down what this stack looks like, from infra and models to debuggers.
Tweet media one
2
3
17
@PatronusAI
PatronusAI
2 months
At @PatronusAI, we're excited to publish a new article on the best practices for using AI agent platforms. 🚀 In this article, you will learn about various AI agent platforms like @n8n_io, @make_hq, @LangChainAI, @crewAIInc, and @huggingface smolagents. The article provides
0
0
4
@PatronusAI
PatronusAI
2 months
Meet @snigdhabanda our FDE Lead! 🎉 Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with
Tweet media one
0
0
0
@PatronusAI
PatronusAI
2 months
At @PatronusAI, we're excited to publish a new article on the best practices for Agentic Workflow. 🚀 In this article, you will learn about agentic workflows, which involve specialized AI agents collaborating to solve complex problems without human intervention, and their
0
1
6
@PatronusAI
PatronusAI
2 months
Using your best AI debugger just got easier -- spotlighting our Percival Integrations! 🛡️ Percival is a highly intelligent agent and AI debugger, capable of detecting 20+ failure modes in agentic traces and suggesting optimizations. Generally, these agentic systems run into
Tweet media one
1
0
2
@MLCommons
MLCommons
2 months
Today, MLCommons is announcing a new collaboration with contributors from across academia, civil society, and industry to co-develop an open agent reliability evaluation standard to operationalize trust in agentic deployments. 1/3 đź”— https://t.co/O04ZCox5yg
Tweet media one
1
1
9
@PatronusAI
PatronusAI
2 months
We’re up to exciting things here at Patronus AI, working at the forefront of AI optimization and evaluation! Recently, we launched Percival, a SOTA AI Agent debugger, and have previously released industry-standard benchmarks for agents like TRAIL and BLUR, as well as
0
1
5
@PatronusAI
PatronusAI
3 months
Thank you, Professor @Zhou_Yu_AI and @bklsummithouse, for the AI Agents in Action: Industry Ă— Academia Exchange! @rebeccatqian, our CTO, was on a panel with Vinay Rao (Advisor at @AnthropicAI), @ShunyuYao12 (Research Scientist at @OpenAI), Robert Parker (Founder of Perceptix),
Tweet media one
Tweet media two
0
2
10