Scale AI @scale_AI X Profile

Scale AI

@scale_AI

Followers

72K

Following

2K

Media

541

Statuses

2K

making AI work

https://t.co/ECE0dHmBSO

Joined July 2016

Don't wanna be here? Send us removal request.

Scale AI

@scale_AI

42 minutes

Agentex is open-source and available to try now: https://t.co/kiM8J9Vpmo Learn more in our blog:

scale.com

Open sourcing Agentex

0

Scale AI

@scale_AI

42 minutes

Today, we’re open-sourcing Agentex, the agentic infrastructure layer in the Scale GenAI Platform. Built for developers everywhere, Agentex gives the community transparency and control to help shape what the future of agent infrastructure looks like.

1

8

Scale AI

@scale_AI

1 day

Today, we honor the courage, dedication, and sacrifice of all who have served. Thank you to our veterans. 🇺🇸

3

0

20

Scale AI

@scale_AI

2 days

Scale 🤝 TIME Today, @TIME rolled out a Scale-powered, site-wide AI reading and discovery experience. The AI agent is just the latest activation in our ongoing partnership to help enhance access to journalism worldwide. Learn more via @axios: https://t.co/fXOzpYO2Rl

2

0

15

Scale AI

@scale_AI

8 days

Explore open roles 🔗

scale.com

Scale AI is the data platform for AI, providing high-quality training data for machine learning applications.

0

5

Scale AI

@scale_AI

8 days

Big news: Scale is growing 🌍 We’re expanding our global footprint with new offices in New York City, London, Washington, D.C., and St. Louis. This growth reflects our investment in our people and our mission to build reliable AI systems for the world’s most important

4

7

43

Bing Liu

@vbingliu

14 days

Can AI actually automate jobs? @Scale_AI and @ai_risks are launching the Remote Labor Index (RLI), the first benchmark and public leaderboard that test how well AI agents can complete real, paid freelance work in domains like software engineering, design, architecture, data

22

76

463

Scale AI

@scale_AI

14 days

Learn more about RLI: https://t.co/EWuA7Rmauh

0

1

Scale AI

@scale_AI

14 days

We’re launching the Remote Labor Index (RLI) with @ai_risks, the first benchmark evaluating whether AI agents can independently complete full, paid freelance tasks. The results provide a needed reality check: automation is advancing, but still has a long way to go. RLI offers a

Dan Hendrycks

@hendrycks

14 days

Can AI automate jobs? We created the Remote Labor Index to test AI’s ability to automate hundreds of long, real-world, economically valuable projects from remote work platforms. While AIs are smart, they are not yet that useful: the current automation rate is less than 3%.

4

10

41

Scale AI

@scale_AI

20 days

Tune in: https://t.co/Cwy2PmlSZZ

0

4

Scale AI

@scale_AI

20 days

Our research team dives into MCP Atlas, one of our newest benchmarks – exploring how it evaluates models and what we’ve learned from the results.

2

5

32

Scale AI

@scale_AI

21 days

There’s no magic wand for making AI work. Scale CEO @jdroege joined @richardquest on @cnni to share what it really takes:

1

2

18

Scale AI

@scale_AI

22 days

See the full results: https://t.co/eNWarc0q0X

scale.com

Explore the SEAL leaderboard with expert-driven LLM benchmarks and updated AI model leaderboards, ranking top models across coding, reasoning and more.

0

1

11

Scale AI

@scale_AI

22 days

We launched SWE-Bench Pro last month to incredible feedback, and we’ve now updated the leaderboard with the latest models and no cost caps. SoTA models now break 40% pass rate. Congrats to @Anthropic for sweeping the top spots! 🥇Claude 4.5 Sonnet 🥈Claude 4 Sonnet 🥉Claude 4.5

Bing Liu

@vbingliu

2 months

🚀 Introducing SWE-Bench Pro — a new benchmark to evaluate LLM coding agents on real, enterprise-grade software engineering tasks. This is the next step beyond SWE-Bench: harder, contamination-resistant, and closer to real-world repos.

39

55

565

Scale AI

@scale_AI

28 days

Learn more about our methodology and see how models stack up:

scale.com

Explore the SEAL leaderboard with expert-driven LLM benchmarks and updated AI model leaderboards, ranking top models across coding, reasoning and more.

0

1

3

Scale AI

@scale_AI

28 days

📣 Releasing our newest benchmark, VisualToolBench (VTB), the first benchmark designed to evaluate how well multimodal large language models (MLLMs) can dynamically interact with and reason about visual information. VTB goes beyond thinking about images, it’s about thinking with

2

5

22

Bing Liu

@vbingliu

1 month

🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. https://t.co/YI6pJ7jfJ1

5

43

267

Jason Droege

@jdroege

1 month

Sat down with @lennysan to talk about where AI is headed and how we’re making it work for model builders, enterprises and governments. Also went down memory lane about my time at Uber Eats. 🙂

Lenny Rachitsky

@lennysan

1 month

In his first in-depth interview since taking over as @scale_AI CEO, @jdroege shares: 🔸 What actually happened with Meta’s $14 billion investment 🔸 Where frontier labs are heading next 🔸 Why most enterprise data is useless for AI models 🔸 What it takes to keep making AI model

3

5

36

Scale AI

@scale_AI

1 month

Full conversation: https://t.co/g91Mr1ywY6

0

5

Scale AI

@scale_AI

1 month

Welcome to Chain of Thought, exploring all things AI, research, and evaluations. This episode: how we think about different types of agents and where they’re headed next.

1

2

26