📣 Releasing our newest benchmark, VisualToolBench (VTB), the first benchmark designed to evaluate how well multimodal large language models (MLLMs) can dynamically interact with and reason about visual information. VTB goes beyond thinking about images, it’s about thinking with
2
5
22
Replies
@scale_AI Hi - could you please include an explanation somewhere on the website regarding why Grok (xAI) hasn't been included in the scoreboards thus far? It's evident from other benchmark sites that it's a frontier model, alongside GPT, Claude & Gemini.
0
0
0