scale_AI Profile Banner
Scale AI Profile
Scale AI

@scale_AI

Followers
71K
Following
2K
Media
538
Statuses
2K

making AI work

Joined July 2016
Don't wanna be here? Send us removal request.
@scale_AI
Scale AI
12 hours
Big news: Scale is growing 🌍 We’re expanding our global footprint with new offices in New York City, London, Washington, D.C., and St. Louis. This growth reflects our investment in our people and our mission to build reliable AI systems for the world’s most important
2
6
30
@vbingliu
Bing Liu
6 days
Can AI actually automate jobs? @Scale_AI and @ai_risks are launching the Remote Labor Index (RLI), the first benchmark and public leaderboard that test how well AI agents can complete real, paid freelance work in domains like software engineering, design, architecture, data
21
75
456
@scale_AI
Scale AI
6 days
Learn more about RLI: https://t.co/EWuA7Rmauh
0
0
0
@scale_AI
Scale AI
6 days
We’re launching the Remote Labor Index (RLI) with @ai_risks, the first benchmark evaluating whether AI agents can independently complete full, paid freelance tasks. The results provide a needed reality check: automation is advancing, but still has a long way to go. RLI offers a
@DanHendrycks
Dan Hendrycks
7 days
Can AI automate jobs? We created the Remote Labor Index to test AI’s ability to automate hundreds of long, real-world, economically valuable projects from remote work platforms. While AIs are smart, they are not yet that useful: the current automation rate is less than 3%.
4
10
39
@scale_AI
Scale AI
12 days
0
0
3
@scale_AI
Scale AI
12 days
Our research team dives into MCP Atlas, one of our newest benchmarks – exploring how it evaluates models and what we’ve learned from the results.
2
5
31
@scale_AI
Scale AI
13 days
There’s no magic wand for making AI work. Scale CEO @jdroege joined @richardquest on @cnni to share what it really takes:
1
2
17
@scale_AI
Scale AI
14 days
We launched SWE-Bench Pro last month to incredible feedback, and we’ve now updated the leaderboard with the latest models and no cost caps. SoTA models now break 40% pass rate. Congrats to @Anthropic for sweeping the top spots! 🥇Claude 4.5 Sonnet 🥈Claude 4 Sonnet 🥉Claude 4.5
@vbingliu
Bing Liu
2 months
🚀 Introducing SWE-Bench Pro — a new benchmark to evaluate LLM coding agents on real, enterprise-grade software engineering tasks. This is the next step beyond SWE-Bench: harder, contamination-resistant, and closer to real-world repos.
39
54
558
@scale_AI
Scale AI
20 days
📣 Releasing our newest benchmark, VisualToolBench (VTB), the first benchmark designed to evaluate how well multimodal large language models (MLLMs) can dynamically interact with and reason about visual information. VTB goes beyond thinking about images, it’s about thinking with
2
5
22
@vbingliu
Bing Liu
26 days
🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. https://t.co/YI6pJ7jfJ1
5
42
266
@jdroege
Jason Droege
26 days
Sat down with @lennysan to talk about where AI is headed and how we’re making it work for model builders, enterprises and governments. Also went down memory lane about my time at Uber Eats. 🙂
@lennysan
Lenny Rachitsky
26 days
In his first in-depth interview since taking over as @scale_AI CEO, @jdroege shares: 🔸 What actually happened with Meta’s $14 billion investment 🔸 Where frontier labs are heading next 🔸 Why most enterprise data is useless for AI models 🔸 What it takes to keep making AI model
3
5
36
@scale_AI
Scale AI
27 days
Full conversation: https://t.co/g91Mr1ywY6
0
0
4
@scale_AI
Scale AI
27 days
Welcome to Chain of Thought, exploring all things AI, research, and evaluations. This episode: how we think about different types of agents and where they’re headed next.
1
2
25
@scale_AI
Scale AI
1 month
“I think one of the misunderstandings is that AI is this magic wand or it can solve all problems, and that’s not true today. But there is a ton of value when you get it right.” Our CEO @jdroege shared his AI success framework with CNN's @claresduffy. https://t.co/pmBKjdivLt
Tweet card summary image
cnn.com
The artificial intelligence industry has a big problem: 95% of companies that try AI aren’t making any money from it, according to a report from the Massachusetts Institute of Technology last month....
3
4
18
@vbingliu
Bing Liu
1 month
New @Scale_AI paper! The culprit behind reward hacking? We trace it to misspecification in high-reward tail. Our fix: rubric-based rewards to tell “excellent” responses apart from “great.” The result: Less hacking, stronger post-training!   https://t.co/D6aJkZ8zZE
4
40
178
@scale_AI
Scale AI
1 month
Sorry for the wait
@luke_metro
Luke Metro
2 months
No one has launched a Scale AI for robotics this week. Recession incoming?
6
10
183