DataChaz Profile Banner
Charly Wargnier Profile
Charly Wargnier

@DataChaz

Followers
144K
Following
91K
Media
6K
Statuses
26K

Ex @Streamlit @Snowflake Maestro 🪄 • X about AI agents, LLMs, web apps, Python & SEO • My ❤️ is open source • DM for collabs 📩

London 🇬🇧 ⇆ 🇫🇷 Pyrenees
Joined January 2009
Don't wanna be here? Send us removal request.
@DataChaz
Charly Wargnier
34 minutes
Gemini 3 just launched, and @Browserbase's already run full computer-use evaluations to see how well it handles a real browser. Clicking, searching, filling forms: they tested it with real browsing tasks 🤘 Here’s how Gemini 3 stacks up against Claude, GPT-5, and others 🧵↓
2
18
27
@DataChaz
Charly Wargnier
34 minutes
Gemini 3 just launched, and @Browserbase's already run full computer-use evaluations to see how well it handles a real browser. Clicking, searching, filling forms: they tested it with real browsing tasks 🤘 Here’s how Gemini 3 stacks up against Claude, GPT-5, and others 🧵↓
2
18
27
@DataChaz
Charly Wargnier
34 minutes
If you found this useful, a like or RT goes a long way! 🦾 Follow me → @datachaz for insights on LLMs, AI agents, and data science!
@DataChaz
Charly Wargnier
34 minutes
Gemini 3 just launched, and @Browserbase's already run full computer-use evaluations to see how well it handles a real browser. Clicking, searching, filling forms: they tested it with real browsing tasks 🤘 Here’s how Gemini 3 stacks up against Claude, GPT-5, and others 🧵↓
0
0
0
@DataChaz
Charly Wargnier
34 minutes
5/ That's a wrap! You can check the full list of results here: →
Tweet card summary image
stagehand.dev
Compare accuracy, costs, and speed for Computer Use Models on Web Voyager and Online Mind2Web benchmarks.
1
0
0
@DataChaz
Charly Wargnier
34 minutes
4/ So... Gemini takes 1st place across all 3 fronts: → accuracy, cost per task, and speed. Claude Sonnet 4 comes 2nd with solid results, and Claude 4.5 follows close behind. A clean sweep for Gemini!! 🏆
1
0
0
@DataChaz
Charly Wargnier
34 minutes
3/ SPEED (lower is better) Gemini doesn’t just win on accuracy and cost. It’s also the fastest model to complete real browser tasks. Browserbase’s benchmarks show an average of ~223s per task, well ahead of Claude 4, Claude 4.5, and GPT-5 ↓
1
0
0
@DataChaz
Charly Wargnier
34 minutes
2/ COST Gemini is also the most cost-efficient model in Browserbase’s Stagehand evals. Around $0.18 per task, far below Claude 4, Claude 4.5, and the OpenAI model 💰💰💰
1
0
0
@DataChaz
Charly Wargnier
34 minutes
1/ ACCURACY In Browserbase’s @Stagehanddev tests, Gemini tops the accuracy charts at ~66%. It outperforms Claude 4, Claude 4.5, and the OpenAI model evaluated.
1
0
1
@DataChaz
Charly Wargnier
34 minutes
But first, I just wanted to say these benchmarks are on another level: → ~4,000 browser hours (!!) → 200+ runs → All parallelized in Browserbase! I tend to be skeptical of leaderboards, but this one is grounded in data. More on their methodology: → https://t.co/7DG2F8kN5H
1
0
1
@DataChaz
Charly Wargnier
5 hours
If you found this useful, a like or RT goes a long way! Follow me → @datachaz for daily insights on LLMs, AI agents, and data science
@DataChaz
Charly Wargnier
5 hours
A must-bookmark for vibe-coders. @YCombinator’s guide to making the most of vibe coding:
0
0
3
@DataChaz
Charly Wargnier
5 hours
Based on @benln’s excellent video here: ↳ https://t.co/GPhDKNs3fb https://t.co/GPhDKNs3fb
1
0
4
@DataChaz
Charly Wargnier
5 hours
A must-bookmark for vibe-coders. @YCombinator’s guide to making the most of vibe coding:
5
8
75
@DataChaz
Charly Wargnier
8 hours
♻️ If this sparked an idea, hit repost so others can catch it too! Follow me → @datachaz for daily drops on LLMs, agents, and data workflows! 🦾
@DataChaz
Charly Wargnier
8 hours
This one’s a gem. A Free 80-page prompt engineering guide is surprising deep, covering: → CoT → Eval methods → RAG → Agents → Prompt hacking → Multimodal prompts ... and more! Link to the guide in 🧵 ↓
0
0
2
@DataChaz
Charly Wargnier
8 hours
Get the PDF here: ↳ https://t.co/7ePtzqZTjQ
1
1
6
@DataChaz
Charly Wargnier
8 hours
This one’s a gem. A Free 80-page prompt engineering guide is surprising deep, covering: → CoT → Eval methods → RAG → Agents → Prompt hacking → Multimodal prompts ... and more! Link to the guide in 🧵 ↓
6
7
29
@DataChaz
Charly Wargnier
23 hours
Please @OpenAI let's not go back to this
@DataChaz
Charly Wargnier
1 day
ChatGPT identifying itself as GPT-5.2 Thinking model
13
2
21
@DataChaz
Charly Wargnier
23 hours
I'm cool with that, as long as we don’t end up in another round of OpenAI-style naming madness 😅
0
0
73
@DataChaz
Charly Wargnier
1 day
My Italian friends say that we gotta start adding "al dente" in our prompts dang 🤌
@DataChaz
Charly Wargnier
1 day
Insane to think these 2 clips are only 2.5 years apart. https://t.co/AofkbmbGIW
2
0
7