@scale_AI
Scale AI
1 month
📣 Releasing our newest benchmark, VisualToolBench (VTB), the first benchmark designed to evaluate how well multimodal large language models (MLLMs) can dynamically interact with and reason about visual information. VTB goes beyond thinking about images, it’s about thinking with
2
5
22

Replies

@scale_AI
Scale AI
1 month
Learn more about our methodology and see how models stack up:
0
1
3
@DMalconten2662
DisaffectedMalcontent
1 month
@scale_AI Hi - could you please include an explanation somewhere on the website regarding why Grok (xAI) hasn't been included in the scoreboards thus far? It's evident from other benchmark sites that it's a frontier model, alongside GPT, Claude & Gemini.
0
0
0