
Peter Gostev
@petergostev
Followers
1K
Following
186
Media
45
Statuses
133
London 🇬🇧 Head of AI https://t.co/bkfw1nxdmJ
Joined June 2025
Impact of Kimi K2 and Qwen 3 Coder on the LLM market, based on the @OpenRouterAI data in the 'programming' category. What we see is quite interesting:. - Sonnet 4 models keep growing as if nothing happened. - Gemini 2.5 Pro is losing share very quickly, from 15% to 9% in a
0
0
1
This is interesting, in my tests, Summit and Lobster (never got Zenith) were way better than Qwen3-Coder every single time. Expect that whatever @OpenAI model version makes it to the leaderboard will be miles above everything else. Nectarine and Starfish around Kiki K2 level.
🚨Breaking News: @Alibaba_Qwen's Qwen3-Coder is now tied for #1 in WebDev Arena. @Kimi_Moonshot's Kimi-K2 also enters the top, ranks at #7. WebDev Arena evaluates real-world web app-building tasks, backed by 150K+ community votes. Top models so far:.- Gemini 2.5 Pro.-
1
1
7
As we get ready for GPT-5, it's useful to look back at how often labs featured in the Top 5 of @lmarena_ai over the last 1.5 years. The competition is primarily between OpenAI and Google. Average appearances overall and specifically in 2025:.- OpenAI: Overall: 2.2; in 2025: 1.7
2
16
150
The killer feature of @OpenAI latest releases (Agent Mode, o3-pro, Codex) is async. Being able to set a task and walk away without worrying about keeping the tab open is exactly how it should be.
1
2
20
It is worth your while spending a bit of time on the Web Dev Arena if you want to get the glimpse of GPT-5.
webdev.lmarena.ai
WebDev Arena: AI Battle to build the best website
0
0
4
Interesting that 9 months is exactly how long babies develop before being born.
Should you start your training run early, so you can train for longer, or wait for the next generation of chips and algorithms? Our latest estimate suggests that it’s not effective to train for more than ~9 months. On current trends, frontier labs will hit that limit by 2027. 🧵
0
0
5
@lmarena_ai @FeatureCrewPod @btibor91 @testingcatalog @chetaslua @kimmonismus @apples_jimmy Prompt link:
0
0
8
@lmarena_ai Prompt from @FeatureCrewPod .Try it on: CC: @btibor91 @testingcatalog @chetaslua @kimmonismus @apples_jimmy.
2
1
9
New models on the @lmarena_ai WebDev arena: . - Lobster. - Nectarine. - Starfish (not in this video). In the video compared to the 'Anonymous Chatbot' (aka o3-Alpha) from 17th July. Observations:. - Lobster is closest to the o3-Alpha, but nowhere near as good. - Nectarine was
6
16
149
This chart shows the best image generation model at any given time, based on the @ArtificialAnlys arena and the model release dates. A few points stand out:. - Massive gains from Dall-e 2 up to Midjourney 6. - Arguably a slowdown in progress for diffusion models since then -
1
10
83
Unlike with LLMs, the image generation market is a bit less competitive - there are many players, but they are not constantly breaking new ground, and some vendors haven't released a competitive model in many months. In this data from @ArtificialAnlys image arena, we can see
1
0
5
RT @ArtificialAnlys: 🎵 Announcing Artificial Analysis Music Arena! Vote for songs generated by leading music models across genres from pop….
0
21
0