Charly Wargnier @DataChaz X Profile

Charly Wargnier

@DataChaz

Followers

144K

Following

91K

Media

6K

Statuses

26K

Ex @Streamlit @Snowflake Maestro 🪄 • X about AI agents, LLMs, web apps, Python & SEO • My ❤️ is open source • DM for collabs 📩

https://t.co/kOE34O9UQQ

London 🇬🇧 ⇆ 🇫🇷 Pyrenees

Joined January 2009

Don't wanna be here? Send us removal request.

Charly Wargnier

@DataChaz

34 minutes

Gemini 3 just launched, and @Browserbase's already run full computer-use evaluations to see how well it handles a real browser. Clicking, searching, filling forms: they tested it with real browsing tasks 🤘 Here’s how Gemini 3 stacks up against Claude, GPT-5, and others 🧵↓

2

18

27

Charly Wargnier

@DataChaz

34 minutes

Gemini 3 just launched, and @Browserbase's already run full computer-use evaluations to see how well it handles a real browser. Clicking, searching, filling forms: they tested it with real browsing tasks 🤘 Here’s how Gemini 3 stacks up against Claude, GPT-5, and others 🧵↓

2

18

27

Charly Wargnier

@DataChaz

34 minutes

If you found this useful, a like or RT goes a long way! 🦾 Follow me → @datachaz for insights on LLMs, AI agents, and data science!

Charly Wargnier

@DataChaz

34 minutes

Gemini 3 just launched, and @Browserbase's already run full computer-use evaluations to see how well it handles a real browser. Clicking, searching, filling forms: they tested it with real browsing tasks 🤘 Here’s how Gemini 3 stacks up against Claude, GPT-5, and others 🧵↓

0

Charly Wargnier

@DataChaz

34 minutes

5/ That's a wrap! You can check the full list of results here: →

stagehand.dev

Compare accuracy, costs, and speed for Computer Use Models on Web Voyager and Online Mind2Web benchmarks.

1

0

Charly Wargnier

@DataChaz

34 minutes

4/ So... Gemini takes 1st place across all 3 fronts: → accuracy, cost per task, and speed. Claude Sonnet 4 comes 2nd with solid results, and Claude 4.5 follows close behind. A clean sweep for Gemini!! 🏆

1

0

Charly Wargnier

@DataChaz

34 minutes

3/ SPEED (lower is better) Gemini doesn’t just win on accuracy and cost. It’s also the fastest model to complete real browser tasks. Browserbase’s benchmarks show an average of ~223s per task, well ahead of Claude 4, Claude 4.5, and GPT-5 ↓

1

0

Charly Wargnier

@DataChaz

34 minutes

2/ COST Gemini is also the most cost-efficient model in Browserbase’s Stagehand evals. Around $0.18 per task, far below Claude 4, Claude 4.5, and the OpenAI model 💰💰💰

1

0

Charly Wargnier

@DataChaz

34 minutes

1/ ACCURACY In Browserbase’s @Stagehanddev tests, Gemini tops the accuracy charts at ~66%. It outperforms Claude 4, Claude 4.5, and the OpenAI model evaluated.

1

0

1

Charly Wargnier

@DataChaz

34 minutes

But first, I just wanted to say these benchmarks are on another level: → ~4,000 browser hours (!!) → 200+ runs → All parallelized in Browserbase! I tend to be skeptical of leaderboards, but this one is grounded in data. More on their methodology: → https://t.co/7DG2F8kN5H

1

0

1

Charly Wargnier

@DataChaz

5 hours

If you found this useful, a like or RT goes a long way! Follow me → @datachaz for daily insights on LLMs, AI agents, and data science

Charly Wargnier

@DataChaz

5 hours

A must-bookmark for vibe-coders. @YCombinator’s guide to making the most of vibe coding:

0

3

Charly Wargnier

@DataChaz

5 hours

Based on @benln’s excellent video here: ↳ https://t.co/GPhDKNs3fb https://t.co/GPhDKNs3fb

1

0

4

Charly Wargnier

@DataChaz

5 hours

A must-bookmark for vibe-coders. @YCombinator’s guide to making the most of vibe coding:

5

8

75

Charly Wargnier

@DataChaz

8 hours

♻️ If this sparked an idea, hit repost so others can catch it too! Follow me → @datachaz for daily drops on LLMs, agents, and data workflows! 🦾

Charly Wargnier

@DataChaz

8 hours

This one’s a gem. A Free 80-page prompt engineering guide is surprising deep, covering: → CoT → Eval methods → RAG → Agents → Prompt hacking → Multimodal prompts ... and more! Link to the guide in 🧵 ↓

0

2

Charly Wargnier

@DataChaz

8 hours

Get the PDF here: ↳ https://t.co/7ePtzqZTjQ

1

6

Charly Wargnier

@DataChaz

8 hours

This one’s a gem. A Free 80-page prompt engineering guide is surprising deep, covering: → CoT → Eval methods → RAG → Agents → Prompt hacking → Multimodal prompts ... and more! Link to the guide in 🧵 ↓

6

7

29