Peter Gostev Profile
Peter Gostev

@petergostev

Followers
1K
Following
186
Media
45
Statuses
133

London 🇬🇧 Head of AI https://t.co/bkfw1nxdmJ

Joined June 2025
Don't wanna be here? Send us removal request.
@petergostev
Peter Gostev
6 minutes
Impact of Kimi K2 and Qwen 3 Coder on the LLM market, based on the @OpenRouterAI data in the 'programming' category. What we see is quite interesting:. - Sonnet 4 models keep growing as if nothing happened. - Gemini 2.5 Pro is losing share very quickly, from 15% to 9% in a
Tweet media one
0
0
1
@petergostev
Peter Gostev
7 hours
This is interesting, in my tests, Summit and Lobster (never got Zenith) were way better than Qwen3-Coder every single time. Expect that whatever @OpenAI model version makes it to the leaderboard will be miles above everything else. Nectarine and Starfish around Kiki K2 level.
@lmarena_ai
lmarena.ai
7 hours
🚨Breaking News: @Alibaba_Qwen's Qwen3-Coder is now tied for #1 in WebDev Arena. @Kimi_Moonshot's Kimi-K2 also enters the top, ranks at #7. WebDev Arena evaluates real-world web app-building tasks, backed by 150K+ community votes. Top models so far:.- Gemini 2.5 Pro.-
Tweet media one
1
1
7
@petergostev
Peter Gostev
1 day
Why hasn't Microsoft trained an actually working PowerPoint and Excel agent? They have full software access, data, environments, compute - and importantly, unlike the SF tech companies, they realise how important PowerPoint and Excel actually are.
5
1
11
@petergostev
Peter Gostev
1 day
As we get ready for GPT-5, it's useful to look back at how often labs featured in the Top 5 of @lmarena_ai over the last 1.5 years. The competition is primarily between OpenAI and Google. Average appearances overall and specifically in 2025:.- OpenAI: Overall: 2.2; in 2025: 1.7
Tweet media one
2
16
150
@petergostev
Peter Gostev
2 days
The killer feature of @OpenAI latest releases (Agent Mode, o3-pro, Codex) is async. Being able to set a task and walk away without worrying about keeping the tab open is exactly how it should be.
1
2
20
@petergostev
Peter Gostev
2 days
Could GPT-5 be claimed to be 'AGI'? The funny thing about the OpenAI 'AGI clause' with Microsoft is that OpenAI needs to show it has developed 'systems' that have the 'capability' to generate $100bn in profits, not to actually generate the $100bn. I am not saying they will do.
1
0
8
@petergostev
Peter Gostev
2 days
"Generate an SVG of Mona Lisa"
Tweet media one
4
1
11
@petergostev
Peter Gostev
2 days
Something I'm slightly worried about is that GPT-5 would have a router and these would be the models that it would decide to route to. For example Zenith (based on others' examples) seems to be amazing at SVGs, while Summit/Lobster not so much.
@petergostev
Peter Gostev
3 days
o3-alpha.Summit.Lobster.Nectarine.Starfish.
3
0
12
@petergostev
Peter Gostev
2 days
Feels like Qwen Coder was heavily trained on Claude Sonnet.
1
1
4
@petergostev
Peter Gostev
3 days
It is worth your while spending a bit of time on the Web Dev Arena if you want to get the glimpse of GPT-5.
Tweet card summary image
webdev.lmarena.ai
WebDev Arena: AI Battle to build the best website
0
0
4
@petergostev
Peter Gostev
3 days
Anthropic is on track to make $4bn revenue this year, projecting to reach $12bn (or even up to $35bn) in revenue by 2027. This largely depends on its continued success in coding, with, as far as I can tell, every single coding assistant defaulting to Claude 4 Sonnet. This could
Tweet media one
3
0
12
@petergostev
Peter Gostev
3 days
o3-alpha.Summit.Lobster.Nectarine.Starfish.
8
3
121
@petergostev
Peter Gostev
3 days
Interesting that 9 months is exactly how long babies develop before being born.
@EpochAIResearch
Epoch AI
3 days
Should you start your training run early, so you can train for longer, or wait for the next generation of chips and algorithms? Our latest estimate suggests that it’s not effective to train for more than ~9 months. On current trends, frontier labs will hit that limit by 2027. 🧵
Tweet media one
0
0
5
@petergostev
Peter Gostev
4 days
New models on the @lmarena_ai WebDev arena: . - Lobster. - Nectarine. - Starfish (not in this video). In the video compared to the 'Anonymous Chatbot' (aka o3-Alpha) from 17th July. Observations:. - Lobster is closest to the o3-Alpha, but nowhere near as good. - Nectarine was
6
16
149
@petergostev
Peter Gostev
4 days
This chart shows the best image generation model at any given time, based on the @ArtificialAnlys arena and the model release dates. A few points stand out:. - Massive gains from Dall-e 2 up to Midjourney 6. - Arguably a slowdown in progress for diffusion models since then -
Tweet media one
1
10
83
@petergostev
Peter Gostev
5 days
Unlike with LLMs, the image generation market is a bit less competitive - there are many players, but they are not constantly breaking new ground, and some vendors haven't released a competitive model in many months. In this data from @ArtificialAnlys image arena, we can see
Tweet media one
1
0
5
@petergostev
Peter Gostev
5 days
I'm old enough to remember when a year ago, 'more than $100m' was considered an 'extreme cost' to train a model
Tweet media one
2
0
4
@petergostev
Peter Gostev
5 days
RT @ArtificialAnlys: 🎵 Announcing Artificial Analysis Music Arena! Vote for songs generated by leading music models across genres from pop….
0
21
0