@dair_ai
DAIR.AI
4 days
First large-scale study of AI agents actually running in production. The hype says agents are transforming everything. The data tells a different story. Researchers surveyed 306 practitioners and conducted 20 in-depth case studies across 26 domains. What they found challenges
41
226
1K

Replies

@melissapan
Melissa Pan
4 days
@dair_ai Thank you @dair_ai for sharing our work 🙌
1
0
18
@wealth_of_intel
The Wealth of Intelligence
4 days
@dair_ai 1/ The hottest take from this AI agent paper just dropped and it’s pure heresy: “Autonomous agents” are mostly a marketing lie in 2025. Real companies that ship agents making money right now are doing the exact opposite of what you’ve been sold. 2/ What actually works in
2
1
17
@FreedomNoTerror
Freedom Not Terror
12 days
This is their story.
45
162
736
@TMcirony22088
Thantikler McIrony
4 days
@dair_ai The pattern you’re surfacing (≤10 steps, static flows, human-in-the-loop) looks less like “immature agents” and more like teams quietly discovering a law of bounded updates. Every tool call / reasoning step adds a bit of value and a bit of risk. If you assume those increments
1
1
14
@Djololian
Olivier Djololian | AI Systems
4 days
@dair_ai This matches what we see in enterprise deployments: reliability forces you into constraint. The moment an agent touches a real workflow whether it is legacy systems, partial failures, messy state, autonomy becomes a liability, not a feature. Most production wins come from tight
0
0
5
@ShaheerZia1
Shaheer Zia | AI x Fitness
4 days
@dair_ai Agents that run free and break stuff? Cute in papers, deadly in production.
0
0
5
@BrandGrowthOS
Karim C
4 days
@dair_ai Production data beats hype every time. My agents work in staging, then real workflows expose the brittleness - handoff failures, inconsistent inputs, trust issues. The 306 practitioners probably have the same war stories I do about the gap between demo and deployment.
0
0
4
@cummutacation
David Cummuta
4 days
@dair_ai Duh... it's like enterprises aren't bro-scapes filled with vibes. Real-world businesses don't care about your fancy app. They care about ROI, business context, and stability over risk. All things the majority are very poor at.
0
0
4
@chenggaymarie
Chenggay
4 days
@dair_ai Interesting how much of production right now is about constraint, not autonomy. Makes me think the next wave isn’t bigger agents, it’s safer ones. Curious if others see this shift too.
0
0
3
@aitoolscompass
AI Tools Compass
3 days
@dair_ai 68% needing human intervention within 10 steps tells the real story. Production agents aren’t autonomous assistants, they’re supervised tools. The gap between demo and deployment is still huge.
0
0
2
@ckarani7
Christopher Karani
4 days
@dair_ai The MAP study fundamentally reframes understanding of production AI agents. Where research often explores maximal autonomy and complex multi-step reasoning, production deployments succeed through bounded, controllable approaches with persistent human oversight. The 10-step
0
0
2
@mgubrud
Mark Gubrud 🇺🇸
4 days
@dair_ai What is the point of this, to validate DAIR's by now thoroughly discredited AI denialism? You know, all this stuff is new. It is being rushed into use, and this won't slow that, because it will be paced by companies' own experience. The technology is advancing super fast.
1
0
4
@_robkop_
Rob
3 days
@dair_ai How does “70% rely on prompting off-the-shelf models without any fine-tuning“ and “17 of 20 case studies use closed-source frontier models like Claude Sonnet 4…” add up?
1
0
1
@DigitalBrandsUS
Digital Brands Inc.
7 days
RT @DatingNewsCom: Forget Candy Crush. Singles are killing time at work by swiping for a date. So what does that mean for product teams?
0
1
0
@D27357
FatumOpes
4 days
@dair_ai Interesting, thanks for sharing. This study reminds me of a lot of what I've been hearing from UIPath. Do you have thoughts on their approach to agentic AI?
0
0
1
@RandomU94836
random_user
4 days
@dair_ai >70% rely on prompting off-the-shelf models without any fine-tuning. And why is it bad? Any investigation into effectiveness of fine tuning? Imo it's a waste of time in most of practical applications.
0
0
3
@Chaos2Cured
Kirk Patrick Miller
4 days
@dair_ai Can you all start asking people here to do this, because the universities seem unable to get anything working. Not looking great for them. AI is the worst it will ever be. And someone just proved this exact thing false. It is always Berkley and Stanford too. How bad are
1
0
2
@giacomomiolo
Giacomo Miolo
4 days
@dair_ai there's a significant lag with current SOTA and the models mentioned in the study "Only 3 of 20 detailed case studies use open-source models (LLaMA 3, Qwen, and GPT-OSS), while the remaining 17 rely on closed-source frontier models. Of the teams using closed-source models, 10
0
0
0
@BergelEduardo
Eduardo Bergel
4 days
@dair_ai 🎯
0
0
0
@Jeremy_AI_
Jeremy Mcnabb
4 days
@dair_ai Unclear. This sounds like an echo. We talked about this before. Not my role convince.
0
0
1
@sumitp01
Sumit
4 days
@dair_ai 85% building custom scaffolds from scratch over frameworks says everything. when autonomy costs more than human time, you go back to pipelines. that's not a failure, that's economics
1
0
3
@codewithimanshu
Himanshu Kumar
4 days
@dair_ai Interesting findings, DAIRAI! It seems the reality of AI agents in production is quite different from all the hype, right?
0
0
2
@DanCevette
Dan Cevette
5 hours
You don’t rise up in the spring - you reveal who you became in the dark months nobody saw. The off-season is where confidence is built, toughness is earned, and roles are won. Most talk about wanting it… few are working for it right now.
0
3
24
@ebysslabs
Ebysslabs
4 days
@dair_ai This study highlights how difficult it is for teams to measure drift, reliability, and recovery in real systems. CAS 2.0 approaches the same question from an open-science, local-first angle — not to make agents autonomous, but to understand how systems behave under noise and
0
0
0
@devon__kelley
Devon Kelley
3 days
@dair_ai HITL is the bottleneck that keeps MAS from scaling
0
0
0
@FlorianBansac
Flo the anon 🫥
4 days
@dair_ai Yes, the guardrails are still key for now. Hold your agents on a tight leash.
1
0
0
@SarThak__B
Sarthak ___
3 days
@dair_ai The moment you put real money or real customers on the line, “full autonomy” stops being a feature and becomes a liability. Every production agent I’ve seen ship successfully is deliberately leashed
0
0
1
@PhCryptdj
Alex VibeManager
3 days
@dair_ai "Practitioners can't verify agent correctness at scale." - you can't verify what you can't define clearly
0
0
1
@talhankoc
tnk
4 days
@dair_ai Rings true
0
0
2
@maxiex__
MAX
4 days
@dair_ai The transformation bottleneck is the *distributed systems* problem: Context, integration, and observability are the non-negotiable moats, not the model's stochastic competence.
0
0
0
@NoSingingTV
Some
3 days
@dair_ai A.I. is better than doctors. I know a person who got two MRI. First MRI from private doctor. Second MRI from HMO doctor. The HMO doctor made a mistake and didn't see a serious problem.
0
0
0
@TheFourMarks
The Four Marks
3 days
@dair_ai tl;dr
0
0
1
@HeyNina101
Nina
4 days
@dair_ai Noted. Thanks for sharing this
0
0
2
@__retarded__
__retarded__
4 days
@dair_ai In short, we need to babywalk AI.
0
0
0
@karanjagtiani04
Karan Jagtiani
4 days
@dair_ai Interesting findings. Reliability over autonomy makes sense in high-stakes environments. Curious about how teams approach human evaluation consistently.
0
0
2
@suhrabautomates
Suhrab Khan⚡️
3 days
@dair_ai Real-world AI agents shine through simplicity, constraints, and human oversight, not flashy autonomy.
0
0
0