We collaborated with @a16z to publish the **State of AI** - an empirical report on how LLMs have been used on OpenRouter. After analyzing more than 100 trillion tokens across hundreds of models and 3+ million users (excluding 3rd party) from the last year, we have a lot of
38
142
773
Replies
@AnjneyMidha @MaikaThoughts @xanderatallah @cclark One finding: we observe a Cinderella "Glass Slipper" effect for new models. Early users a new LLM either churn quickly or become part of a foundational cohort, with much higher retention than others. They are early adopters who can "lead" the rest of the market (more details 👇)
1
3
32
Our dataset: anonymized request-level metadata from OpenRouter, including classifications. We used this to study behavior at scale without reading any prompts or completions directly.
2
0
19
Open vs Closed Weights: By late 2025, open-weight models (abbreviated as OSS below) reached ~â…“ of usage, sustained beyond launch spikes, but have plateaued in Q4.
1
1
30
Chinese models: grew from ~1% to around 30% in some weeks. Release velocity + quality make the market lively.
1
2
28
If you want a single picture of the modern stack: - Closed models = high-value workloads - Open models = high-volume workloads And what we have seen is that a lot of teams use both.
2
5
41
OSS isn't "just for tinkering" - it is extremely popular in two areas: 🧙‍♂️ Roleplay / creative dialogue: >50% of OSS usage 🧑‍💻 Programming assistance: ~15-20%
2
1
34
Zooming in on Chinese OSS, roleplay is still big, but programming + tech are becoming the majority. These models are evolving out of the “creative corner."
2
1
22
Now the significant platform shift: agentic inference 🤖 We tracked it via: - reasoning model adoption - tool calling - prompt/completion “shape” (sequence lengths)
1
1
16
Reasoning models go from “negligible” to more than 50% of tokens in 2025. Full paradigm shift.
1
1
25
Tool use is climbing too: the share of requests that actually invoke tools rises steadily through the year.
2
0
21
Context is exploding. - Average prompt length grew 4Ă— (from ~1.5k to 6k+ tokens). - Completions grew 3Ă— (from ~150 to ~400).
2
1
22
Translation: the median request is less “write me an essay” and more “here’s a pile of code/docs/logs - now, extract signal.”
1
0
23
The top usage category: Programming. It grew from ~11% of token volume early 2025 to over 50% in recent weeks.
4
1
24
Languages: English dominates with more than 80% of tokens, but the tail is real - Chinese, Russian, Spanish, etc.
1
0
19
Economics: price matters, but less than you think. On our cost vs usage map, the trendline is nearly flat: reducing cost by 10% only correlates with ~0.5-0.7% more usage.
1
1
16
There’s a helpful median: $0.73 per 1M tokens splits the market into four quadrants (premium workloads, volume drivers, specialists, utilities). Also, caching makes the final prices lower than list rates.
1
3
48
Finally, retention. We observe a Cinderella "Glass Slipper" effect for new models. The first week of users creates a foundational cohort: users whose workloads have achieved a deep and persistent workload-model fit.
2
0
23
Once established, this workload–model fit creates both economic and cognitive inertia that resists substitution, even as newer models emerge. The Glass Slipper phenomenon reframes retention not as an outcome but as a lens for understanding capability breakthroughs. Foundational
1
1
16