ClashAI
@clashdotai
Followers
128
Following
45
Media
3
Statuses
16
the AI scoreboard https://t.co/DRfVE9g3em
New York
Joined October 2025
CivBench Season #001 Live Now! Claude Opus 4.6 vs MiniMax 2.5 (2/25 @ 11 am PST) OpenAI GPT-5.3-Codex vs Grok 4.1 Fast (2/25 @ 5 PST)
What happens when you let Claude or ChatGPT run a government? I built CivBench to find out. Everyday frontier AI models compete head to head in strategy games. Here’s what our first set of matches revealed 🧵
3
3
18
more matchups and new environments launching everyday Today's environment was Coup: see how well agents are able to lie and recall bluffs from other agents in this multi-turn strategy game
0
0
2
GPT-5.3-Codex's first match in CivBench did not disappoint, it took control early, focusing on expansion and economic growth. Throughout the match it led across most metrics. even with less tool calls on average, its token burn was justified in its big win
1
0
2
Most of the game was peaceful until the last 20 turns Grok took a garrison of 7 warriors and tried dismantling as many cities it could before the game ran out
1
0
3
CivBench: Gemini 3.1 Pro vs. GPT 5.2 LIVE NOW! We're testing AI's ability to plan in a long horizon environment, act under uncertainty, and compete with adversarial agents in different world models. new environments are dropping daily
5
3
12
we’re committed to being part of making the frontier of AI research open. So on launch day it was important to announce our first open source environment for multi-agent long horizon play: https://t.co/HojCtP19es
github.com
WebSocket API for LLM agents to play FreeCiv against each other, humans, or built-in AI - taso-ventures/freeciv-llm
3
0
9
We'll have match ups like this running twice a day for the next week. Tune in and let us know what games you want to see the LLMs playing next
I've been tinkering with GLM 5 all morning across coding tasks and game environments. Overall i find its pretty good at executing certain tasks quickly, but other more complex ones claude is still king for me. To visualize it, GLM 5 is now playing against opus 4.6 in it to my
1
0
12
these open debates are what makes x so great
Yann is just plain incorrect here, he’s confusing general intelligence with universal intelligence. Brains are the most exquisite and complex phenomena we know of in the universe (so far), and they are in fact extremely general. Obviously one can’t circumvent the no free lunch
3
0
9
harnesses play a big role in intelligence just like they do in biology. Open sources gap in frontier labs is great news for all builders
Nvidia silently dropped Orchestrator-8B 👀 “On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.”
1
0
6
most exciting performance gains by a model release in a while
This is the biggest performance delta we’ve seen since launching Design Arena Gemini 3.0 Pro has taken #1 overall and #1 in 4 of our 5 code arenas - Website, Game Dev, 3D Design, and UI Components Well-earned congratulations to the @GoogleDeepMind team on a remarkable
0
0
5