trytheta Profile Banner
Theta Profile
Theta

@trytheta

Followers
758
Following
64
Media
2
Statuses
16

Specialized AI for Every Job

Joined May 2025
Don't wanna be here? Send us removal request.
@trytheta
Theta
8 months
Introducing CUB: Humanity's Last Exam for Computer and Browser Use Agents
32
41
248
@trytheta
Theta
8 months
For a deeper dive, check out our blog with @SeanZCai:
1
1
15
@trytheta
Theta
8 months
Why Now? (4/4) AI-first browsers are poised to disrupt the massive web browser market, with highly anticipated releases like Comet from @perplexity_ai on the way. It's yet to be seen how Google integrates Project Mariner and other AI tools within Chrome.
1
1
16
@trytheta
Theta
8 months
Why Now? (3/4) Open source frameworks like @browser_use and @Stagehanddev have become some of the most popular repos on Github, with tens of thousands of stars.
1
0
13
@trytheta
Theta
8 months
Why Now? (2/4) Computer/browser use has become one of the most important frontiers for model capabilities, with @OpenAI, @AnthropicAI, and @GoogleDeepMind having dedicated teams to Operator, Claude Computer Use, and Project Mariner.
1
0
12
@trytheta
Theta
8 months
Why Now? (1/4) We're seeing new companies launch in the space every week, for both consumer and enterprise use cases. @ManusAI_HQ is one of the most popular generalist consumer agents, and @AthenaIntell is already being used by companies like Anheuser-Busch.
2
1
12
@trytheta
Theta
8 months
Browser agents use computers the same way humans do, unlocking powerful use cases for personal assistants, browsers, and enterprise workflows. After talking to 20+ founders in the space, we're excited to put out the definitive market map for browser agents.
28
87
588
@garrytan
Garry Tan
8 months
The AI labs need better evals and one of my favorite current YC batch companies just released a one with a *lot* of headroom
@trytheta
Theta
8 months
Introducing CUB: Humanity's Last Exam for Computer and Browser Use Agents
20
27
374
@trytheta
Theta
8 months
The Theta team started CUB as an internal evalset, but it quickly grew into a full-fledged benchmark over the past month. We're excited to test even more models and frameworks. For more on the benchmark, including examples and a full paper, check out our blog:
1
0
19
@trytheta
Theta
8 months
Computer/browser use agents still have a long way to go for more complex, end-to-end workflows. Actual task completion is far below our reported numbers: we gave credit for partially correct solutions and reaching key checkpoints. In total, there were less than 10 instances
1
0
18
@trytheta
Theta
8 months
We worked with domain experts (accountants, investment bankers, doctors, etc.) to create representative tasks of real-world workflows and software tools. We've heard from so many companies in the CUA/browser agent space who are already tackling these workflows, but existing
1
0
18
@trytheta
Theta
8 months
@browser_use took a big hit at 3.78% because it struggled with spreadsheets, but we're confident it would do much better with some improvement in that area. Despite @GoogleAI Gemini 2.5 Pro's strong multimodal performance on other benchmarks, it completely failed at computer use
2
1
22
@trytheta
Theta
8 months
Among the agents we tested, @ManusAI_HQ came out on top at 9.23%, followed by @OpenAI Operator at 7.28% and @AnthropicAI Claude 3.7 Computer Use at 6.01%. We found that Manus' proactive planning and orchestration helped it come out on top.
1
1
24
@_gurvir_
Gurvir Singh
8 months
we've been misled to believe that manual prompt hacking is the solution to teaching LLMs how to approach complex problems. why write a "magic prompt" to pattern match for every type of problem you might care about, when LLMs have already shown extraordinary ability to self-review
@karpathy
Andrej Karpathy
8 months
We're missing (at least one) major paradigm for LLM learning. Not sure what to call it, possibly it has a name - system prompt learning? Pretraining is for knowledge. Finetuning (SL/RL) is for habitual behavior. Both of these involve a change in parameters but a lot of human
3
5
28
@ycombinator
Y Combinator
8 months
Theta (@trytheta) allows AI agents to learn from their mistakes in real-time. Their memory layer has already improved the accuracy of OpenAI Operator by 43% with 7x fewer steps taken. https://t.co/9uI9vbSYLs Congrats on the launch, @RayanGarg, @tsha444, and @_gurvir_!
21
44
382