Scale AI @scale_AI X Profile

Scale AI

@scale_AI

Followers

71K

Following

2K

Media

538

Statuses

2K

making AI work

https://t.co/ECE0dHmBSO

Joined July 2016

Don't wanna be here? Send us removal request.

Scale AI

@scale_AI

12 hours

Explore open roles 🔗

scale.com

Scale AI is the data platform for AI, providing high-quality training data for machine learning applications.

0

2

Scale AI

@scale_AI

12 hours

Big news: Scale is growing 🌍 We’re expanding our global footprint with new offices in New York City, London, Washington, D.C., and St. Louis. This growth reflects our investment in our people and our mission to build reliable AI systems for the world’s most important

2

6

30

Bing Liu

@vbingliu

6 days

Can AI actually automate jobs? @Scale_AI and @ai_risks are launching the Remote Labor Index (RLI), the first benchmark and public leaderboard that test how well AI agents can complete real, paid freelance work in domains like software engineering, design, architecture, data

21

75

456

Scale AI

@scale_AI

6 days

Learn more about RLI: https://t.co/EWuA7Rmauh

0

Scale AI

@scale_AI

6 days

We’re launching the Remote Labor Index (RLI) with @ai_risks, the first benchmark evaluating whether AI agents can independently complete full, paid freelance tasks. The results provide a needed reality check: automation is advancing, but still has a long way to go. RLI offers a

Dan Hendrycks

@DanHendrycks

7 days

Can AI automate jobs? We created the Remote Labor Index to test AI’s ability to automate hundreds of long, real-world, economically valuable projects from remote work platforms. While AIs are smart, they are not yet that useful: the current automation rate is less than 3%.

4

10

39

Scale AI

@scale_AI

12 days

Tune in: https://t.co/Cwy2PmlSZZ

0

3

Scale AI

@scale_AI

12 days

Our research team dives into MCP Atlas, one of our newest benchmarks – exploring how it evaluates models and what we’ve learned from the results.

2

5

31

Scale AI

@scale_AI

13 days

There’s no magic wand for making AI work. Scale CEO @jdroege joined @richardquest on @cnni to share what it really takes:

1

2

17

Scale AI

@scale_AI

14 days

See the full results: https://t.co/eNWarc0q0X

scale.com

Explore the SEAL leaderboard with expert-driven LLM benchmarks and updated AI model leaderboards, ranking top models across coding, reasoning and more.

0

1

10

Scale AI

@scale_AI

14 days

We launched SWE-Bench Pro last month to incredible feedback, and we’ve now updated the leaderboard with the latest models and no cost caps. SoTA models now break 40% pass rate. Congrats to @Anthropic for sweeping the top spots! 🥇Claude 4.5 Sonnet 🥈Claude 4 Sonnet 🥉Claude 4.5

Bing Liu

@vbingliu

2 months

🚀 Introducing SWE-Bench Pro — a new benchmark to evaluate LLM coding agents on real, enterprise-grade software engineering tasks. This is the next step beyond SWE-Bench: harder, contamination-resistant, and closer to real-world repos.

39

54

558

Scale AI

@scale_AI

20 days

Learn more about our methodology and see how models stack up:

scale.com

Explore the SEAL leaderboard with expert-driven LLM benchmarks and updated AI model leaderboards, ranking top models across coding, reasoning and more.

0

1

2

Scale AI

@scale_AI

20 days

📣 Releasing our newest benchmark, VisualToolBench (VTB), the first benchmark designed to evaluate how well multimodal large language models (MLLMs) can dynamically interact with and reason about visual information. VTB goes beyond thinking about images, it’s about thinking with

2

5

22

Bing Liu

@vbingliu

26 days

🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. https://t.co/YI6pJ7jfJ1

5

42

266

Jason Droege

@jdroege

26 days

Sat down with @lennysan to talk about where AI is headed and how we’re making it work for model builders, enterprises and governments. Also went down memory lane about my time at Uber Eats. 🙂

Lenny Rachitsky

@lennysan

26 days

In his first in-depth interview since taking over as @scale_AI CEO, @jdroege shares: 🔸 What actually happened with Meta’s $14 billion investment 🔸 Where frontier labs are heading next 🔸 Why most enterprise data is useless for AI models 🔸 What it takes to keep making AI model

3

5

36

Scale AI

@scale_AI

27 days

Full conversation: https://t.co/g91Mr1ywY6

0

4

Scale AI

@scale_AI

27 days

Welcome to Chain of Thought, exploring all things AI, research, and evaluations. This episode: how we think about different types of agents and where they’re headed next.

1

2

25

Scale AI

@scale_AI

1 month

“I think one of the misunderstandings is that AI is this magic wand or it can solve all problems, and that’s not true today. But there is a ton of value when you get it right.” Our CEO @jdroege shared his AI success framework with CNN's @claresduffy. https://t.co/pmBKjdivLt

cnn.com

The artificial intelligence industry has a big problem: 95% of companies that try AI aren’t making any money from it, according to a report from the Massachusetts Institute of Technology last month....

3

4

18

Bing Liu

@vbingliu

1 month

New @Scale_AI paper! The culprit behind reward hacking? We trace it to misspecification in high-reward tail. Our fix: rubric-based rewards to tell “excellent” responses apart from “great.” The result: Less hacking, stronger post-training! https://t.co/D6aJkZ8zZE