
Jonathan Roberts
@JRobertsAI
Followers
549
Following
212
Media
19
Statuses
85
PhD Student, Applied Machine Learning, University of Cambridge
Cambridge
Joined December 2022
Is computer vision “solved”? Not yet Current models score 0% on ZeroBench 🧵1/6
58
254
3K
Benchmark details and full leaderboard 👇 https://t.co/NouEsFxJEM
zerobench.github.io
An Impossible Visual Benchmark for Contemporary Large Multimodal Models
0
0
3
More details and updated leaderboard 👇 https://t.co/E4noN7yDDM
0
0
1
New opening for Assistant Professor in Machine Learning @Cambridge_Eng closing on 22 Sept 2025: https://t.co/7mNgww7Vq3
3
16
115
We just shipped Gemini 2.5 Deep Think it doesn't just recall research papers - it fuses ideas across papers in ways I haven't seen before this level of capability demands careful evaluation model card below 👇
38
151
2K
#ACL2025NLP Introducing GAMEBoT—a competitive battle arena for LLM reasoning! We pit 17 top LLMs against each other in 8 strategic games. Who will outsmart whom? 🧠⚔️ We break down their reasoning into clear, verifiable steps. No black boxes—just transparent evaluation.
1
1
7
🔍 Dive deeper—leaderboard, sample questions, eval protocol, and more on the project site: 👉
0
0
1
Thanks to all those who contributed to ZeroBench! https://t.co/E4noN7yDDM
0
0
2
📄 You can read the full Gemini report here ⬇️ https://t.co/i4GVKyr3RZ
1
0
2
🎉 Thrilled @GoogleDeepMind included ZeroBench in the Gemini 2.5 technical report as a benchmark for image understanding. Gemini has made impressive gains—it’s great to see our benchmark is still challenging for frontier models!
3
5
22
📢📢More progress on ZeroBench! With the release of Claude 4 from @AnthropicAI the SOTA pass@1 is now 4% 🔥 Claude Sonnet 3.7: 1% Claude Sonnet 3.7 (Thinking): 3% Claude Sonnet 4: 2% Claude Sonnet 4 (Thinking): 3% Claude Opus 4: 1% Claude Opus 4 (Thinking): 4%
1
2
15
👏Some recent ZeroBench pass@1 results: o3: 3% Gemini 2.5 Pro: 3% o4-mini: 2% Llama 4 Maverick: 0% GPT-4.1: 0%
4
6
42