DAIR.AI @dair_ai tweet - First large-scale study of AI agents actually running in production. The hype says agents are transforming everything. The data tells a different story. Researchers surveyed 306 practitioners and conducted 20 in-depth case studies across 26 domains. What they found challenges https://t.co/w6cVEN2bdM

Melissa Pan

@melissapan

4 days

@dair_ai Thank you @dair_ai for sharing our work 🙌

1

0

18

The Wealth of Intelligence

@wealth_of_intel

4 days

@dair_ai 1/ The hottest take from this AI agent paper just dropped and it’s pure heresy: “Autonomous agents” are mostly a marketing lie in 2025. Real companies that ship agents making money right now are doing the exact opposite of what you’ve been sold. 2/ What actually works in

2

1

17

Freedom Not Terror

@FreedomNoTerror

12 days

This is their story.

45

162

736

Thantikler McIrony

@TMcirony22088

4 days

@dair_ai The pattern you’re surfacing (≤10 steps, static flows, human-in-the-loop) looks less like “immature agents” and more like teams quietly discovering a law of bounded updates. Every tool call / reasoning step adds a bit of value and a bit of risk. If you assume those increments

1

14

Olivier Djololian | AI Systems

@Djololian

4 days

@dair_ai This matches what we see in enterprise deployments: reliability forces you into constraint. The moment an agent touches a real workflow whether it is legacy systems, partial failures, messy state, autonomy becomes a liability, not a feature. Most production wins come from tight

0

5

Shaheer Zia | AI x Fitness

@ShaheerZia1

4 days

@dair_ai Agents that run free and break stuff? Cute in papers, deadly in production.

0

5

Karim C

@BrandGrowthOS

4 days

@dair_ai Production data beats hype every time. My agents work in staging, then real workflows expose the brittleness - handoff failures, inconsistent inputs, trust issues. The 306 practitioners probably have the same war stories I do about the gap between demo and deployment.

0

4

David Cummuta

@cummutacation

4 days

@dair_ai Duh... it's like enterprises aren't bro-scapes filled with vibes. Real-world businesses don't care about your fancy app. They care about ROI, business context, and stability over risk. All things the majority are very poor at.

0

4

Chenggay

@chenggaymarie

4 days

@dair_ai Interesting how much of production right now is about constraint, not autonomy. Makes me think the next wave isn’t bigger agents, it’s safer ones. Curious if others see this shift too.

0

3

AI Tools Compass

@aitoolscompass

3 days

@dair_ai 68% needing human intervention within 10 steps tells the real story. Production agents aren’t autonomous assistants, they’re supervised tools. The gap between demo and deployment is still huge.

0

2

Christopher Karani

@ckarani7

4 days

@dair_ai The MAP study fundamentally reframes understanding of production AI agents. Where research often explores maximal autonomy and complex multi-step reasoning, production deployments succeed through bounded, controllable approaches with persistent human oversight. The 10-step

0

2

Mark Gubrud 🇺🇸

@mgubrud

4 days

@dair_ai What is the point of this, to validate DAIR's by now thoroughly discredited AI denialism? You know, all this stuff is new. It is being rushed into use, and this won't slow that, because it will be paced by companies' own experience. The technology is advancing super fast.

1

0

4

Rob

@_robkop_

3 days

@dair_ai How does “70% rely on prompting off-the-shelf models without any fine-tuning“ and “17 of 20 case studies use closed-source frontier models like Claude Sonnet 4…” add up?

1

0

1

Digital Brands Inc.

@DigitalBrandsUS

7 days

RT @DatingNewsCom: Forget Candy Crush. Singles are killing time at work by swiping for a date. So what does that mean for product teams?

0

1

0

FatumOpes

@D27357

4 days

@dair_ai Interesting, thanks for sharing. This study reminds me of a lot of what I've been hearing from UIPath. Do you have thoughts on their approach to agentic AI?

0

1

random_user

@RandomU94836

4 days

@dair_ai >70% rely on prompting off-the-shelf models without any fine-tuning. And why is it bad? Any investigation into effectiveness of fine tuning? Imo it's a waste of time in most of practical applications.

0

3

Kirk Patrick Miller

@Chaos2Cured

4 days

@dair_ai Can you all start asking people here to do this, because the universities seem unable to get anything working. Not looking great for them. AI is the worst it will ever be. And someone just proved this exact thing false. It is always Berkley and Stanford too. How bad are

1

0

2

Giacomo Miolo

@giacomomiolo

4 days

@dair_ai there's a significant lag with current SOTA and the models mentioned in the study "Only 3 of 20 detailed case studies use open-source models (LLaMA 3, Qwen, and GPT-OSS), while the remaining 17 rely on closed-source frontier models. Of the teams using closed-source models, 10

0

Eduardo Bergel

@BergelEduardo

4 days

@dair_ai 🎯

0

Jeremy Mcnabb

@Jeremy_AI_

4 days

@dair_ai Unclear. This sounds like an echo. We talked about this before. Not my role convince.

0

1

Sumit

@sumitp01

4 days

@dair_ai 85% building custom scaffolds from scratch over frameworks says everything. when autonomy costs more than human time, you go back to pipelines. that's not a failure, that's economics

1

0

3

Himanshu Kumar

@codewithimanshu

4 days

@dair_ai Interesting findings, DAIRAI! It seems the reality of AI agents in production is quite different from all the hype, right?

0

2

Dan Cevette

@DanCevette

5 hours

You don’t rise up in the spring - you reveal who you became in the dark months nobody saw. The off-season is where confidence is built, toughness is earned, and roles are won. Most talk about wanting it… few are working for it right now.

0

3

24

Ebysslabs

@ebysslabs

4 days

@dair_ai This study highlights how difficult it is for teams to measure drift, reliability, and recovery in real systems. CAS 2.0 approaches the same question from an open-science, local-first angle — not to make agents autonomous, but to understand how systems behave under noise and

0

Devon Kelley

@devon__kelley

3 days

@dair_ai HITL is the bottleneck that keeps MAS from scaling

0

Flo the anon 🫥

@FlorianBansac

4 days

@dair_ai Yes, the guardrails are still key for now. Hold your agents on a tight leash.

1

0

Sarthak ___

@SarThak__B

3 days

@dair_ai The moment you put real money or real customers on the line, “full autonomy” stops being a feature and becomes a liability. Every production agent I’ve seen ship successfully is deliberately leashed

0

1

Alex VibeManager

@PhCryptdj

3 days

@dair_ai "Practitioners can't verify agent correctness at scale." - you can't verify what you can't define clearly

0

1

tnk

@talhankoc

4 days

@dair_ai Rings true

0

2

MAX

@maxiex__

4 days

@dair_ai The transformation bottleneck is the *distributed systems* problem: Context, integration, and observability are the non-negotiable moats, not the model's stochastic competence.

0

Some

@NoSingingTV

3 days

@dair_ai A.I. is better than doctors. I know a person who got two MRI. First MRI from private doctor. Second MRI from HMO doctor. The HMO doctor made a mistake and didn't see a serious problem.

0

The Four Marks

@TheFourMarks

3 days

@dair_ai tl;dr

0

1

Nina

@HeyNina101

4 days

@dair_ai Noted. Thanks for sharing this

0

2

__retarded__

@__retarded__

4 days

@dair_ai In short, we need to babywalk AI.

0

Karan Jagtiani

@karanjagtiani04

4 days

@dair_ai Interesting findings. Reliability over autonomy makes sense in high-stakes environments. Curious about how teams approach human evaluation consistently.

0

2

Suhrab Khan⚡️

@suhrabautomates

3 days

@dair_ai Real-world AI agents shine through simplicity, constraints, and human oversight, not flashy autonomy.

0

Replies