Alex Shaw @alexgshaw X Profile

Alex Shaw

@alexgshaw

Followers

175

Following

2K

Media

12

Statuses

275

Researching & investing @ Laude. Co-creator of Terminal Bench. Formerly Google. BYU alum.

Joined October 2021

Don't wanna be here? Send us removal request.

Alex Shaw

@alexgshaw

12 days

Excited to team up with @andykonwinski on Laude Institute, his next endeavor to normalize bringing research breakthroughs into real users' hands. His vision and research-to-product strategy have fundamentally shaped how we built terminal-bench and (hopefully!) will continue to.

Andy Konwinski

@andykonwinski

12 days

Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including @JeffDean & @jpineau1 on the board, @LaudeInstitute catalyzes research with real-world impact.

0

1

17

Alex Shaw

@alexgshaw

17 hours

RT @dbreunig: Now's a good moment to plug @alexgshaw and @Mike_A_Merrill's terminal-bench:

0

1

0

Alex Shaw

@alexgshaw

11 days

Btw, the leaderboard got a fresh coat of paint, check it out!

0

3

Alex Shaw

@alexgshaw

11 days

Congrats to the Warp team for setting a new SOTA on Terminal-Bench!. I’ve been using Warp since 2022 so it’s exciting to see them use the benchmark!.

Warp

@warpdotdev

11 days

Introducing Warp 2.0: the Agentic Development Environment. 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified.2️⃣ Agent multi-threading: build features, debug, and ship all at once.3️⃣ The first all-in-one platform for agentic development. 🧵 Learn more

3

1

17

Alex Shaw

@alexgshaw

12 days

RT @lschmidt3: I'm a big fan of the approach to research funding @andykonwinski and the Laude team are taking! Working with them on termina….

0

5

0

Alex Shaw

@alexgshaw

19 days

RT @Mike_A_Merrill: this is why we made terminal bench - just give the ai a bash shell, it'll be fine.

0

4

0

Alex Shaw

@alexgshaw

26 days

(about terminal bench!).

0

Alex Shaw

@alexgshaw

26 days

I’ll be speaking (briefly) at DAIS with @andykonwinski and @Mike_A_Merrill! Please tune in :).

Andy Konwinski

@andykonwinski

26 days

I <3 meetups, and tonight’s at #DataAISummit is next level - 2k ppl, multi-track, with keynotes. #meetupXXL. I’ll be talking (right after @matei_zaharia) about K Prize, Terminal-Bench, and the noble quest for hard, relevant benchmarks. See you in room 208 at 6pm.

1

0

3

Alex Shaw

@alexgshaw

1 month

RT @ryanmart3n: Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on….

0

190

0

Alex Shaw

@alexgshaw

1 month

As always, we're looking for more contributors so please join our discord, or let us know if there is an eval you would like us to integrate!.

0

2

Alex Shaw

@alexgshaw

1 month

The terminal-bench CLI makes it possible for agent developers to integrate their agent and quickly run across a suite of integrated evals, confidently compare against the results of others, and reproduce their and others' results.

1

0

3

Alex Shaw

@alexgshaw

1 month

We also realized that many existing benchmarks fit into the terminal-bench framework due to its flexibility (almost anything with an instruction, docker env, and test script is compatible).

1

0

2

Alex Shaw

@alexgshaw

1 month

We just released the terminal-bench CLI. Right after we shipped our initial batch of 80 tasks in terminal-bench-core-v0, our team began building v1. We needed a tool to distribute the different versions of terminal-bench while enabling comparison and reproducibility. 🧵⬇️

3

0

16

Alex Shaw

@alexgshaw

1 month

RT @LaudeVentures: Congrats to our co-founder and GP @psonsini on making the @Forbes 2025 Midas List! A well-earned recognition for a legen….

0

1

0

Alex Shaw

@alexgshaw

1 month

Cloud providers love the emerging vibe-coding market

0

3

Alex Shaw

@alexgshaw

1 month

This is one of the main reasons we built Terminal-Bench (and why Anthropic cites it in their Claude 4 headline!). The terminal is an underrated tool and improving the ability of agents to use it effectively translates to agents becoming really good at using a computer.

Guillermo Rauch

@rauchg

1 month

It’s 2025 and some of the most impactful products in the world are CLIs. Coding agents love running CLIs. ChatGPT solves problems by writing scripts in virtual computers that invoke CLIs. CLIs ftw!.

0

4

16

Alex Shaw

@alexgshaw

1 month

RT @ChrisRytting: Refreshing and delightful to see a new line on the latest model cards: Agentic terminal use via our brand new Terminal-be….

0

1

0

Alex Shaw

@alexgshaw

1 month

RT @Mike_A_Merrill: Thrilled to see Terminal-Bench on the Claude 4 model card. We're just getting started! Come join our community to help….

0

3

0

Alex Shaw

@alexgshaw

1 month

If you haven’t already, check out our terminal bench announcement 💻.

0

Alex Shaw

@alexgshaw

1 month

Exciting to see Anthropic including Terminal Bench on their model card and scoring a new best on the benchmark! Congrats to the team on two great new models — can’t wait to try them out!.

Anthropic

@AnthropicAI

1 month

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

2

0

10