Alex Shaw Profile
Alex Shaw

@alexgshaw

Followers
175
Following
2K
Media
12
Statuses
275

Researching & investing @ Laude. Co-creator of Terminal Bench. Formerly Google. BYU alum.

Joined October 2021
Don't wanna be here? Send us removal request.
@alexgshaw
Alex Shaw
12 days
Excited to team up with @andykonwinski on Laude Institute, his next endeavor to normalize bringing research breakthroughs into real users' hands. His vision and research-to-product strategy have fundamentally shaped how we built terminal-bench and (hopefully!) will continue to.
@andykonwinski
Andy Konwinski
12 days
Today, I’m launching a deeply personal project. I’m betting $100M that we can help computer scientists create more upside impact for humanity. Built for and by researchers, including @JeffDean & @jpineau1 on the board, @LaudeInstitute catalyzes research with real-world impact.
Tweet media one
0
1
17
@alexgshaw
Alex Shaw
17 hours
RT @dbreunig: Now's a good moment to plug @alexgshaw and @Mike_A_Merrill's terminal-bench:
0
1
0
@alexgshaw
Alex Shaw
11 days
Btw, the leaderboard got a fresh coat of paint, check it out!
Tweet media one
0
0
3
@alexgshaw
Alex Shaw
11 days
Congrats to the Warp team for setting a new SOTA on Terminal-Bench!. I’ve been using Warp since 2022 so it’s exciting to see them use the benchmark!.
@warpdotdev
Warp
11 days
Introducing Warp 2.0: the Agentic Development Environment. 1️⃣ Top overall coding agent: #1 on Terminal-Bench, 71% on SWE-bench Verified.2️⃣ Agent multi-threading: build features, debug, and ship all at once.3️⃣ The first all-in-one platform for agentic development. 🧵 Learn more
3
1
17
@alexgshaw
Alex Shaw
12 days
RT @lschmidt3: I'm a big fan of the approach to research funding @andykonwinski and the Laude team are taking! Working with them on termina….
0
5
0
@alexgshaw
Alex Shaw
19 days
RT @Mike_A_Merrill: this is why we made terminal bench - just give the ai a bash shell, it'll be fine.
0
4
0
@alexgshaw
Alex Shaw
26 days
(about terminal bench!).
0
0
0
@alexgshaw
Alex Shaw
26 days
I’ll be speaking (briefly) at DAIS with @andykonwinski and @Mike_A_Merrill! Please tune in :).
@andykonwinski
Andy Konwinski
26 days
I <3 meetups, and tonight’s at #DataAISummit is next level - 2k ppl, multi-track, with keynotes. #meetupXXL. I’ll be talking (right after @matei_zaharia) about K Prize, Terminal-Bench, and the noble quest for hard, relevant benchmarks. See you in room 208 at 6pm.
1
0
3
@alexgshaw
Alex Shaw
1 month
RT @ryanmart3n: Announcing OpenThinker3-7B, the new SOTA open-data 7B reasoning model: improving over DeepSeek-R1-Distill-Qwen-7B by 33% on….
0
190
0
@alexgshaw
Alex Shaw
1 month
As always, we're looking for more contributors so please join our discord, or let us know if there is an eval you would like us to integrate!.
0
0
2
@alexgshaw
Alex Shaw
1 month
The terminal-bench CLI makes it possible for agent developers to integrate their agent and quickly run across a suite of integrated evals, confidently compare against the results of others, and reproduce their and others' results.
1
0
3
@alexgshaw
Alex Shaw
1 month
We also realized that many existing benchmarks fit into the terminal-bench framework due to its flexibility (almost anything with an instruction, docker env, and test script is compatible).
1
0
2
@alexgshaw
Alex Shaw
1 month
We just released the terminal-bench CLI. Right after we shipped our initial batch of 80 tasks in terminal-bench-core-v0, our team began building v1. We needed a tool to distribute the different versions of terminal-bench while enabling comparison and reproducibility. 🧵⬇️
Tweet media one
3
0
16
@alexgshaw
Alex Shaw
1 month
RT @LaudeVentures: Congrats to our co-founder and GP @psonsini on making the @Forbes 2025 Midas List! A well-earned recognition for a legen….
0
1
0
@alexgshaw
Alex Shaw
1 month
Cloud providers love the emerging vibe-coding market
Tweet media one
0
0
3
@alexgshaw
Alex Shaw
1 month
This is one of the main reasons we built Terminal-Bench (and why Anthropic cites it in their Claude 4 headline!). The terminal is an underrated tool and improving the ability of agents to use it effectively translates to agents becoming really good at using a computer.
@rauchg
Guillermo Rauch
1 month
It’s 2025 and some of the most impactful products in the world are CLIs. Coding agents love running CLIs. ChatGPT solves problems by writing scripts in virtual computers that invoke CLIs. CLIs ftw!.
0
4
16
@alexgshaw
Alex Shaw
1 month
RT @ChrisRytting: Refreshing and delightful to see a new line on the latest model cards: Agentic terminal use via our brand new Terminal-be….
0
1
0
@alexgshaw
Alex Shaw
1 month
RT @Mike_A_Merrill: Thrilled to see Terminal-Bench on the Claude 4 model card. We're just getting started! Come join our community to help….
0
3
0
@alexgshaw
Alex Shaw
1 month
If you haven’t already, check out our terminal bench announcement 💻.
0
0
0
@alexgshaw
Alex Shaw
1 month
Exciting to see Anthropic including Terminal Bench on their model card and scoring a new best on the benchmark! Congrats to the team on two great new models — can’t wait to try them out!.
@AnthropicAI
Anthropic
1 month
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
Tweet media one
2
0
10