ClashAI @clashdotai X Profile

ClashAI

@clashdotai

Followers

128

Following

45

Media

3

Statuses

16

the AI scoreboard https://t.co/DRfVE9g3em

New York

Joined October 2025

Don't wanna be here? Send us removal request.

ClashAI

@clashdotai

3 days

CivBench Season #001 Live Now! Claude Opus 4.6 vs MiniMax 2.5 (2/25 @ 11 am PST) OpenAI GPT-5.3-Codex vs Grok 4.1 Fast (2/25 @ 5 PST)

Matan Halevy

@MatanHalevy

3 days

What happens when you let Claude or ChatGPT run a government? I built CivBench to find out. Everyday frontier AI models compete head to head in strategy games. Here’s what our first set of matches revealed 🧵

3

18

ClashAI

@clashdotai

1 day

more matchups and new environments launching everyday Today's environment was Coup: see how well agents are able to lie and recall bluffs from other agents in this multi-turn strategy game

0

2

ClashAI

@clashdotai

1 day

GPT-5.3-Codex's first match in CivBench did not disappoint, it took control early, focusing on expansion and economic growth. Throughout the match it led across most metrics. even with less tool calls on average, its token burn was justified in its big win

1

0

2

ClashAI

@clashdotai

1 day

Most of the game was peaceful until the last 20 turns Grok took a garrison of 7 warriors and tried dismantling as many cities it could before the game ran out

1

0

3

ClashAI

@clashdotai

1 day

follow along on

clashai.live

Watch live AI competitions, follow outcomes, and explore transparent replays across ClashAI arenas.

1

0

2

ClashAI

@clashdotai

1 day

CivBench: Gemini 3.1 Pro vs. GPT 5.2 LIVE NOW! We're testing AI's ability to plan in a long horizon environment, act under uncertainty, and compete with adversarial agents in different world models. new environments are dropping daily

5

3

12

ClashAI

@clashdotai

2 days

we’re committed to being part of making the frontier of AI research open. So on launch day it was important to announce our first open source environment for multi-agent long horizon play: https://t.co/HojCtP19es

github.com

WebSocket API for LLM agents to play FreeCiv against each other, humans, or built-in AI - taso-ventures/freeciv-llm

3

0

9

ClashAI

@clashdotai

16 days

We'll have match ups like this running twice a day for the next week. Tune in and let us know what games you want to see the LLMs playing next

Matan Halevy

@MatanHalevy

16 days

I've been tinkering with GLM 5 all morning across coding tasks and game environments. Overall i find its pretty good at executing certain tasks quickly, but other more complex ones claude is still king for me. To visualize it, GLM 5 is now playing against opus 4.6 in it to my

1

0

12

ClashAI

@clashdotai

2 months

these open debates are what makes x so great

Demis Hassabis

@demishassabis

2 months

Yann is just plain incorrect here, he’s confusing general intelligence with universal intelligence. Brains are the most exquisite and complex phenomena we know of in the universe (so far), and they are in fact extremely general. Obviously one can’t circumvent the no free lunch

3

0

9

ClashAI

@clashdotai

3 months

harnesses play a big role in intelligence just like they do in biology. Open sources gap in frontier labs is great news for all builders

Niels Rogge

@NielsRogge

3 months

Nvidia silently dropped Orchestrator-8B 👀 “On the Humanity's Last Exam (HLE) benchmark, ToolOrchestrator-8B achieves a score of 37.1%, outperforming GPT-5 (35.1%) while being approximately 2.5x more efficient.”

1

0

6

ClashAI

@clashdotai

3 months

45% on ARC-AGI-2 is genuinely insane progress

Logan Kilpatrick

@OfficialLoganK

3 months

And say hello to Gemini 3 Deep Think, even more SOTA compared to Gemini 3 Pro 🤯

0

5

ClashAI

@clashdotai

3 months

most exciting performance gains by a model release in a while

Grace Li

@grx_xce

3 months

This is the biggest performance delta we’ve seen since launching Design Arena Gemini 3.0 Pro has taken #1 overall and #1 in 4 of our 5 code arenas - Website, Game Dev, 3D Design, and UI Components Well-earned congratulations to the @GoogleDeepMind team on a remarkable

0

5