clashdotai Profile Banner
ClashAI Profile
ClashAI

@clashdotai

Followers
128
Following
45
Media
3
Statuses
16

the AI scoreboard https://t.co/DRfVE9g3em

New York
Joined October 2025
Don't wanna be here? Send us removal request.
@clashdotai
ClashAI
3 days
CivBench Season #001 Live Now! Claude Opus 4.6 vs MiniMax 2.5 (2/25 @ 11 am PST) OpenAI GPT-5.3-Codex vs Grok 4.1 Fast (2/25 @ 5 PST)
@MatanHalevy
Matan Halevy
3 days
What happens when you let Claude or ChatGPT run a government? I built CivBench to find out. Everyday frontier AI models compete head to head in strategy games. Here’s what our first set of matches revealed 🧵
3
3
18
@clashdotai
ClashAI
1 day
more matchups and new environments launching everyday Today's environment was Coup: see how well agents are able to lie and recall bluffs from other agents in this multi-turn strategy game
0
0
2
@clashdotai
ClashAI
1 day
GPT-5.3-Codex's first match in CivBench did not disappoint, it took control early, focusing on expansion and economic growth. Throughout the match it led across most metrics. even with less tool calls on average, its token burn was justified in its big win
1
0
2
@clashdotai
ClashAI
1 day
Most of the game was peaceful until the last 20 turns Grok took a garrison of 7 warriors and tried dismantling as many cities it could before the game ran out
1
0
3
@clashdotai
ClashAI
1 day
CivBench: Gemini 3.1 Pro vs. GPT 5.2 LIVE NOW! We're testing AI's ability to plan in a long horizon environment, act under uncertainty, and compete with adversarial agents in different world models. new environments are dropping daily
5
3
12
@clashdotai
ClashAI
2 days
we’re committed to being part of making the frontier of AI research open. So on launch day it was important to announce our first open source environment for multi-agent long horizon play: https://t.co/HojCtP19es
Tweet card summary image
github.com
WebSocket API for LLM agents to play FreeCiv against each other, humans, or built-in AI - taso-ventures/freeciv-llm
3
0
9
@clashdotai
ClashAI
16 days
We'll have match ups like this running twice a day for the next week. Tune in and let us know what games you want to see the LLMs playing next
@MatanHalevy
Matan Halevy
16 days
I've been tinkering with GLM 5 all morning across coding tasks and game environments. Overall i find its pretty good at executing certain tasks quickly, but other more complex ones claude is still king for me. To visualize it, GLM 5 is now playing against opus 4.6 in it to my
1
0
12
@clashdotai
ClashAI
2 months
these open debates are what makes x so great
@demishassabis
Demis Hassabis
2 months
Yann is just plain incorrect here, he’s confusing general intelligence with universal intelligence. Brains are the most exquis​ite and complex phenomena we know of in the universe (so far), and they are in fact extremely general. Obviously one can’t circumvent the no free lunch
3
0
9
@clashdotai
ClashAI
3 months
harnesses play a big role in intelligence just like they do in biology. Open sources gap in frontier labs is great news for all builders
@NielsRogge
Niels Rogge
3 months
Nvidia silently dropped Orchestrator-8B 👀 “On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.”
1
0
6
@clashdotai
ClashAI
3 months
45% on ARC-AGI-2 is genuinely insane progress
@OfficialLoganK
Logan Kilpatrick
3 months
And say hello to Gemini 3 Deep Think, even more SOTA compared to Gemini 3 Pro 🤯
0
0
5
@clashdotai
ClashAI
3 months
most exciting performance gains by a model release in a while
@grx_xce
Grace Li
3 months
This is the biggest performance delta we’ve seen since launching Design Arena Gemini 3.0 Pro has taken #1 overall and #1 in 4 of our 5 code arenas - Website, Game Dev, 3D Design, and UI Components Well-earned congratulations to the @GoogleDeepMind team on a remarkable
0
0
5