Benchmark
@benchmark
Followers
90K
Following
6
Media
171
Statuses
11K
Benchmark focuses on early-stage venture investing in consumer, marketplaces, open-source, AI, infrastructure, and enterprise software.
Joined June 2009
We’ve doubled the size of APEX, our benchmark for measuring whether frontier models can perform economically valuable work across four jobs: investment banking associate, management consultant, big law associate, and primary care physician (MD). The new APEX leaderboard shows: -
3
28
56
Sierra uses 15+ frontier and open source models for low latency tool calling and decision-making, precision classification, long-context reasoning, and empathy/tone. We call this a constellation of models, and it’s a key ingredient to the state of the art performance of agents
sierra.ai
Agents built on Sierra are assembled from 15+ purpose-built models working in concert, so they can handle complex tasks with speed, precision, and on-brand execution.
23
22
403
Exa is now my default search engine! For the longest time, Exa wasn’t general enough or fast enough for daily driving. But we’ve dramatically improved our index, algorithm, and latency. What happens with simple searches? We detect that and return you useful information as fast
Smarter than your default search engine, faster than your default chat app Try the new https://t.co/cQ6UlWHnKY
23
16
378
Exa and Benchmark are hosting a special event at NeurIPS Thursday evening. DM me your spicy takes on semantic retrieval if you want to come aboard ⛴️
10
6
112
This astonishing $100M milestone points to the real story—the human story. My third journey with @btaylor from the beginning has given me a front-row seat to his arc of growth. He and @claybavor embody decades of personal evolution, intersecting perfectly with the most explosive
Sierra just hit $100M in ARR, just seven quarters since we launched in February 2024. @claybavor and I are very grateful to our customers and proud of the Sierra team, who has redefined the meaning of intensity and craftsmanship. I have never had this much fun in my career.
8
7
171
It started with @tonyzzhao and @chichengcc in their apartment in April 2024. We moved out of our hacker house in Mountain View in February of 2025 with 10 people. Today, we have over 30 team members. Velocity of progress to a real, useful robot is everything to us. The progress
87
62
613
Sierra just hit $100M in ARR, just seven quarters since we launched in February 2024. @claybavor and I are very grateful to our customers and proud of the Sierra team, who has redefined the meaning of intensity and craftsmanship. I have never had this much fun in my career.
78
52
1K
We don’t just ship software. We help our customers succeed by being a partner during the most significant period of change the legal industry has ever seen. Tune in to the conversation I had with @chetanp at @SlushHQ. Full video: https://t.co/Bom850CAjN
3
6
21
Today, we are releasing the best open-weight LLM by a US company: Cogito v2.1 671B. On most industry benchmarks and our internal evals, the model performs competitively with frontier closed and open models, while being ahead of any US open model (such as the best versions of
89
110
726
After 18 months in stealth, dozens of prototypes, millions of real-home demonstrations, and one final all-nighter, we’re thrilled for you to say hello to Memo
200
289
3K
Tomorrow, I’ll be on the Main Stage at @SlushHQ with Benchmark’s @chetanp to talk about one of my favourite parts of @WeAreLegora's story. Looking forward to this one! More details here: https://t.co/nR9F6uxorT
0
3
14
Today we're launching Manus Browser Operator. Any browser can now become an AI browser. One extension. No download. No new setup. Your browser already works. Your logins. Your sessions. Your habits. Now with the full power of Manus.
111
334
3K
En route to the Capital of Capital!
6
4
129
🚀 Fireworks Reinforcement Fine-Tuning (RFT) launched! After many months of iteration with real world use cases, we are excited to launch Fireworks RFT public preview. It’s a managed RL service that turns open frontier models (e.g. DeepSeek V3, Kimi K2) into custom agents for
22
49
346
💻Sandboxes for DeepAgents We're excited to launch Sandboxes for DeepAgents, a new set of integrations that allow you to safely execute arbitrary DeepAgent code and bash commands in remote sandboxes. Supports @RunloopAI @daytonaio @modal Your DeepAgent runs locally (or
14
63
387
Introducing Cerebras for Nations, our global initiative to advance and scale sovereign AI. How it works: 1️⃣ We will build world-class AI supercomputers with our WSE-3 chips and CS-3 systems 2️⃣ Co-develop state-of-the-art models and deploy with the world’s fastest inference 3️⃣
8
23
143
The Wall Street Journal is starting to see what we’ve seen at @cerebras for years. The “reticle limit” - the boundary that defines how large a chip can be - has become the ceiling for progress. It has kept chips the size of postage stamps for more than 20 years. Every new
132
258
3K
Excited to share how we're working with @NotionHQ to transform knowledge bases into execution engines. Here are some ways people are transforming their workflows👇🧵 https://t.co/JEUVhtt1bD
manus.im
Discover how real users are leveraging the Manus-Notion MCP integration to transform their workflows. Learn how the Model Context Protocol enables bidirectional data flow, turning Notion from a...
7
15
131
Like meeting a long-lost sibling for the first time -- thanks for the great conversation Harry!
I am so bored of hearing podcasts with guests that have done the podcast tour. Benchmark added Ev Randle as their latest GP. This is his first public appearance as a Benchmark GP. - Why Margins Matter Less in AI - Why Mega Funds Will Not Produce Good Returns - OpenAI vs
2
5
107