scale_AI Profile Banner
Scale AI Profile
Scale AI

@scale_AI

Followers
72K
Following
2K
Media
541
Statuses
2K

making AI work

Joined July 2016
Don't wanna be here? Send us removal request.
@scale_AI
Scale AI
42 minutes
Agentex is open-source and available to try now: https://t.co/kiM8J9Vpmo Learn more in our blog:
Tweet card summary image
scale.com
Open sourcing Agentex
0
0
0
@scale_AI
Scale AI
42 minutes
Today, we’re open-sourcing Agentex, the agentic infrastructure layer in the Scale GenAI Platform. Built for developers everywhere, Agentex gives the community transparency and control to help shape what the future of agent infrastructure looks like.
1
1
8
@scale_AI
Scale AI
1 day
Today, we honor the courage, dedication, and sacrifice of all who have served. Thank you to our veterans. 🇺🇸
3
0
20
@scale_AI
Scale AI
2 days
Scale 🤝 TIME Today, @TIME rolled out a Scale-powered, site-wide AI reading and discovery experience. The AI agent is just the latest activation in our ongoing partnership to help enhance access to journalism worldwide. Learn more via @axios: https://t.co/fXOzpYO2Rl
2
0
15
@scale_AI
Scale AI
8 days
Big news: Scale is growing 🌍 We’re expanding our global footprint with new offices in New York City, London, Washington, D.C., and St. Louis. This growth reflects our investment in our people and our mission to build reliable AI systems for the world’s most important
4
7
43
@vbingliu
Bing Liu
14 days
Can AI actually automate jobs? @Scale_AI and @ai_risks are launching the Remote Labor Index (RLI), the first benchmark and public leaderboard that test how well AI agents can complete real, paid freelance work in domains like software engineering, design, architecture, data
22
76
463
@scale_AI
Scale AI
14 days
Learn more about RLI: https://t.co/EWuA7Rmauh
0
0
1
@scale_AI
Scale AI
14 days
We’re launching the Remote Labor Index (RLI) with @ai_risks, the first benchmark evaluating whether AI agents can independently complete full, paid freelance tasks. The results provide a needed reality check: automation is advancing, but still has a long way to go. RLI offers a
@hendrycks
Dan Hendrycks
14 days
Can AI automate jobs? We created the Remote Labor Index to test AI’s ability to automate hundreds of long, real-world, economically valuable projects from remote work platforms. While AIs are smart, they are not yet that useful: the current automation rate is less than 3%.
4
10
41
@scale_AI
Scale AI
20 days
0
0
4
@scale_AI
Scale AI
20 days
Our research team dives into MCP Atlas, one of our newest benchmarks – exploring how it evaluates models and what we’ve learned from the results.
2
5
32
@scale_AI
Scale AI
21 days
There’s no magic wand for making AI work. Scale CEO @jdroege joined @richardquest on @cnni to share what it really takes:
1
2
18
@scale_AI
Scale AI
22 days
We launched SWE-Bench Pro last month to incredible feedback, and we’ve now updated the leaderboard with the latest models and no cost caps. SoTA models now break 40% pass rate. Congrats to @Anthropic for sweeping the top spots! 🥇Claude 4.5 Sonnet 🥈Claude 4 Sonnet 🥉Claude 4.5
@vbingliu
Bing Liu
2 months
🚀 Introducing SWE-Bench Pro — a new benchmark to evaluate LLM coding agents on real, enterprise-grade software engineering tasks. This is the next step beyond SWE-Bench: harder, contamination-resistant, and closer to real-world repos.
39
55
565
@scale_AI
Scale AI
28 days
📣 Releasing our newest benchmark, VisualToolBench (VTB), the first benchmark designed to evaluate how well multimodal large language models (MLLMs) can dynamically interact with and reason about visual information. VTB goes beyond thinking about images, it’s about thinking with
2
5
22
@vbingliu
Bing Liu
1 month
🔄RLHF → RLVR → Rubrics → OnlineRubrics 👤 Human feedback = noisy & coarse 🧮 Verifiable rewards = too narrow 📋 Static rubrics = rigid, easy to hack, miss emergent behaviors 💡We introduce OnlineRubrics: elicited rubrics that evolve as models train. https://t.co/YI6pJ7jfJ1
5
43
267
@jdroege
Jason Droege
1 month
Sat down with @lennysan to talk about where AI is headed and how we’re making it work for model builders, enterprises and governments. Also went down memory lane about my time at Uber Eats. 🙂
@lennysan
Lenny Rachitsky
1 month
In his first in-depth interview since taking over as @scale_AI CEO, @jdroege shares: 🔸 What actually happened with Meta’s $14 billion investment 🔸 Where frontier labs are heading next 🔸 Why most enterprise data is useless for AI models 🔸 What it takes to keep making AI model
3
5
36
@scale_AI
Scale AI
1 month
Full conversation: https://t.co/g91Mr1ywY6
0
0
5
@scale_AI
Scale AI
1 month
Welcome to Chain of Thought, exploring all things AI, research, and evaluations. This episode: how we think about different types of agents and where they’re headed next.
1
2
26