Scale AI @scale_AI tweet - 📣 Releasing our newest benchmark, VisualToolBench (VTB), the first benchmark designed to evaluate how well multimodal large language models (MLLMs) can dynamically interact with and reason about visual information. VTB goes beyond thinking about images, it’s about thinking with https://t.co/892A3PgdrK

Scale AI

@scale_AI

1 month

📣 Releasing our newest benchmark, VisualToolBench (VTB), the first benchmark designed to evaluate how well multimodal large language models (MLLMs) can dynamically interact with and reason about visual information. VTB goes beyond thinking about images, it’s about thinking with

Replies

Scale AI

@scale_AI

1 month

Learn more about our methodology and see how models stack up:

DisaffectedMalcontent

@DMalconten2662

1 month

@scale_AI Hi - could you please include an explanation somewhere on the website regarding why Grok (xAI) hasn't been included in the scoreboards thus far? It's evident from other benchmark sites that it's a frontier model, alongside GPT, Claude & Gemini.