scale_AI Profile Banner
Scale AI Profile
Scale AI

@scale_AI

Followers
72K
Following
2K
Media
563
Statuses
2K

making AI work

Joined July 2016
Don't wanna be here? Send us removal request.
@scale_AI
Scale AI
3 days
We recently introduced MCP-Atlas, a benchmark for evaluating how well LLMs handle tool use via the Model Context Protocol. Even top models failed nearly half of realistic multi-tool tasks. Today, we’re open-sourcing the benchmark so you can measure performance yourself.
1
5
21
@scale_AI
Scale AI
4 days
Speech isn’t just text read out loud. 💬 Real conversations are dynamic, full of interruptions, and context-rich — and benchmarks should match. Introducing Audio MultiChallenge (Audio MC), the first benchmark built to test how well native Speech-to-Speech models handle real
2
2
22
@scale_AI
Scale AI
4 days
Major drop today by @GoogleAI! ⚡️ Gemini 3 Flash scored🥈on MCP Atlas and tracking strong on Humanity’s Last Exam.
@OfficialLoganK
Logan Kilpatrick
5 days
Introducing Gemini 3 Flash, our frontier intelligence model, available at scale for everyone. It excels at coding, tool calling, and is stronger than 2.5 Pro across most metrics!! ⚡️ Available in the API at $0.50 in / 1M tokens and $3.00 out / 1M tokens across.
2
2
23
@gdb
Greg Brockman
5 days
GPT-5 Pro for very hard problems:
@scale_AI
Scale AI
6 days
GPT-5 Pro by @OpenAI is the Best Reasoning Model of 2025. 🏆 Calculated across SEAL’s reasoning leaderboards, GPT-5 Pro was the best at answering complicated questions, explaining its thinking, and solving multi-step problems.
24
19
425
@alexeheath
Alex Heath
5 days
talked to Scale's head of research about creating the Oscars for AI
sources.news
Scale's head of research: “Evaluation is falling behind the development of model capabilities."
1
5
21
@scale_AI
Scale AI
5 days
GPT-5 Chat by @OpenAI and Claude Sonnet 4.5 by @AnthropicAI are the People’s Favorite Models of 2025.🏆 Determined by performance on SEAL Showdown, where real users pick the better response in head-to-head comparisons, GPT-5 Chat and Sonnet 4.5 were the big winners.
0
2
18
@scale_AI
Scale AI
5 days
Claude Opus 4.5 by @AnthropicAI is the Best Agentic Model of 2025. 🏆 Across leaderboards that test models on ambiguous tasks — like multi-step projects and debugging — Opus 4.5 was the top performer.
1
5
24
@scale_AI
Scale AI
5 days
Gemini 3 by @GoogleAI is the Best Multimodal Model of 2025 🏆 When evaluating which models are best at understanding images alongside texts, Gemini 3 took the top spot.
0
2
16
@scale_AI
Scale AI
6 days
Claude Sonnet 4.5 by @AnthropicAI is the Best Safety Model of 2025. 🏆 Measuring across all safety evaluations, Sonnet 4.5 excelled at staying consistent, following safety guidelines, and avoiding unsafe outputs, even when under pressure.
1
2
21
@scale_AI
Scale AI
6 days
GPT-5 Pro by @OpenAI is the Best Reasoning Model of 2025. 🏆 Calculated across SEAL’s reasoning leaderboards, GPT-5 Pro was the best at answering complicated questions, explaining its thinking, and solving multi-step problems.
3
10
93
@scale_AI
Scale AI
6 days
Gemini 3 by @GoogleAI is the Best Composite Performance Model of 2025.🏆 The model was the top performer across all of the SEAL Leaderboards in 2025.
1
3
26
@scale_AI
Scale AI
6 days
Introducing Scale’s Model of the Year Awards. 🏆 These awards, based entirely on SEAL Leaderboard performance, celebrate the best models across six major categories.
2
4
24
@scale_AI
Scale AI
6 days
Hundreds of models stand before us, but we only have six photos in our hands. Tune in tomorrow, December 16th to see who will be crowned Scale’s Next Top AI Models of 2025.
1
10
29