First large-scale study of AI agents actually running in production. The hype says agents are transforming everything. The data tells a different story. Researchers surveyed 306 practitioners and conducted 20 in-depth case studies across 26 domains. What they found challenges
41
226
1K
Replies
@dair_ai 1/ The hottest take from this AI agent paper just dropped and it’s pure heresy: “Autonomous agents” are mostly a marketing lie in 2025. Real companies that ship agents making money right now are doing the exact opposite of what you’ve been sold. 2/ What actually works in
2
1
17
@dair_ai The pattern you’re surfacing (≤10 steps, static flows, human-in-the-loop) looks less like “immature agents” and more like teams quietly discovering a law of bounded updates. Every tool call / reasoning step adds a bit of value and a bit of risk. If you assume those increments
1
1
14
@dair_ai This matches what we see in enterprise deployments: reliability forces you into constraint. The moment an agent touches a real workflow whether it is legacy systems, partial failures, messy state, autonomy becomes a liability, not a feature. Most production wins come from tight
0
0
5
@dair_ai Production data beats hype every time. My agents work in staging, then real workflows expose the brittleness - handoff failures, inconsistent inputs, trust issues. The 306 practitioners probably have the same war stories I do about the gap between demo and deployment.
0
0
4
@dair_ai Duh... it's like enterprises aren't bro-scapes filled with vibes. Real-world businesses don't care about your fancy app. They care about ROI, business context, and stability over risk. All things the majority are very poor at.
0
0
4
@dair_ai Interesting how much of production right now is about constraint, not autonomy. Makes me think the next wave isn’t bigger agents, it’s safer ones. Curious if others see this shift too.
0
0
3
@dair_ai 68% needing human intervention within 10 steps tells the real story. Production agents aren’t autonomous assistants, they’re supervised tools. The gap between demo and deployment is still huge.
0
0
2
@dair_ai The MAP study fundamentally reframes understanding of production AI agents. Where research often explores maximal autonomy and complex multi-step reasoning, production deployments succeed through bounded, controllable approaches with persistent human oversight. The 10-step
0
0
2
@dair_ai What is the point of this, to validate DAIR's by now thoroughly discredited AI denialism? You know, all this stuff is new. It is being rushed into use, and this won't slow that, because it will be paced by companies' own experience. The technology is advancing super fast.
1
0
4
@dair_ai How does “70% rely on prompting off-the-shelf models without any fine-tuning“ and “17 of 20 case studies use closed-source frontier models like Claude Sonnet 4…” add up?
1
0
1
RT @DatingNewsCom: Forget Candy Crush. Singles are killing time at work by swiping for a date. So what does that mean for product teams?
0
1
0
@dair_ai Interesting, thanks for sharing. This study reminds me of a lot of what I've been hearing from UIPath. Do you have thoughts on their approach to agentic AI?
0
0
1
@dair_ai >70% rely on prompting off-the-shelf models without any fine-tuning. And why is it bad? Any investigation into effectiveness of fine tuning? Imo it's a waste of time in most of practical applications.
0
0
3
@dair_ai Can you all start asking people here to do this, because the universities seem unable to get anything working. Not looking great for them. AI is the worst it will ever be. And someone just proved this exact thing false. It is always Berkley and Stanford too. How bad are
1
0
2
@dair_ai there's a significant lag with current SOTA and the models mentioned in the study "Only 3 of 20 detailed case studies use open-source models (LLaMA 3, Qwen, and GPT-OSS), while the remaining 17 rely on closed-source frontier models. Of the teams using closed-source models, 10
0
0
0
@dair_ai Unclear. This sounds like an echo. We talked about this before. Not my role convince.
0
0
1
@dair_ai 85% building custom scaffolds from scratch over frameworks says everything. when autonomy costs more than human time, you go back to pipelines. that's not a failure, that's economics
1
0
3
@dair_ai Interesting findings, DAIRAI! It seems the reality of AI agents in production is quite different from all the hype, right?
0
0
2
You don’t rise up in the spring - you reveal who you became in the dark months nobody saw. The off-season is where confidence is built, toughness is earned, and roles are won. Most talk about wanting it… few are working for it right now.
0
3
24
@dair_ai This study highlights how difficult it is for teams to measure drift, reliability, and recovery in real systems. CAS 2.0 approaches the same question from an open-science, local-first angle — not to make agents autonomous, but to understand how systems behave under noise and
0
0
0
@dair_ai The moment you put real money or real customers on the line, “full autonomy” stops being a feature and becomes a liability. Every production agent I’ve seen ship successfully is deliberately leashed
0
0
1
@dair_ai "Practitioners can't verify agent correctness at scale." - you can't verify what you can't define clearly
0
0
1
@dair_ai The transformation bottleneck is the *distributed systems* problem: Context, integration, and observability are the non-negotiable moats, not the model's stochastic competence.
0
0
0
@dair_ai A.I. is better than doctors. I know a person who got two MRI. First MRI from private doctor. Second MRI from HMO doctor. The HMO doctor made a mistake and didn't see a serious problem.
0
0
0
@dair_ai Interesting findings. Reliability over autonomy makes sense in high-stakes environments. Curious about how teams approach human evaluation consistently.
0
0
2
@dair_ai Real-world AI agents shine through simplicity, constraints, and human oversight, not flashy autonomy.
0
0
0