Casey Chu
@caseychu9
Followers
4K
Following
2K
Media
22
Statuses
263
Researcher at @openai
San Francisco, CA
Joined August 2017
We launched ChatGPT Agent today! When tested on a variety of REAL work tasks (expert tasks that might take >10h), we found that its output was human-quality almost 50% of the time Agent puts o3's intelligence into practice - try your work tasks and let us know how it goes!
ChatGPT can now do work for you using its own computer. Introducing ChatGPT agent—a unified agentic system combining Operator’s action-taking remote browser, deep research’s web synthesis, and ChatGPT’s conversational strengths.
14
9
141
Last year I posted a video about a ~200,000 year old denisovan genome. Today, we got a pre-print… and some beautiful figures. It’s seeming more and more likely that our divergences have been underestimated due to the hybridization events that occurred after.
12
36
301
It’s time for vibe-lifeing
13
7
63
To summarize this week: - we released general purpose computer using agent - got beaten by a single human in atcoder heuristics competition - solved 5/6 new IMO problems with natural language proofs All of those are based on the same single reinforcement learning system
43
117
1K
watching chatgpt agent use a computer to do complex tasks has been a real "feel the agi" moment for me; something about seeing the computer think, plan, and execute hits different.
1K
812
13K
Huge thanks to @gracejkim9 Elizabeth Proehl @michelelwang @marwan_aljubeh @rachelds__ @tejalpatwardhan for putting this eval together, letting us measure Agent's capabilities in realistic work settings 💪
1
0
12
@OpenAI We also found that, when allowed 16 tries per problem, ChatGPT agent’s score grew from 27% to 49% on the tier 1-3 set. This suggests that better prompting or scaffolding might result in better performance from current models.
1
0
33
Great post from @xikun_zhang_, who did a great job making sure collaboration with Agent feels good!
Just launched ChatGPT Agent (sorry GPT-5 waiters, it is coming!), the most capable AI agent model to date! It has been such an honor to be part of a crazy sprint to get this amazing model trained and shipped together with an absolutely gem team (@isafulf , @caseychu9 ,
0
0
9
Join us in making the next generation of agents both capable and safe! We think that agents will be a big part of how we interact with AI in the future, making it critical that we think carefully about how we build them.
We're hiring for a new team @OpenAI: Agent Robustness and Control Our goal is to make sure our agents safe and secure during training and deployment. Want to work on some of the hardest problems in AI today? Apply via link in reply or DM me!
0
1
14
It's deeply concerning that one of the best AI researchers I've worked with, @kaicathyc, was denied a U.S. green card today. A Canadian who's lived and contributed here for 12 years now has to leave. We’re risking America’s AI leadership when we turn away talent like this.
404
754
9K
LLMs have complex joint beliefs about all sorts of quantities. And my postdoc @jamesrequeima visualized them! In this thread we show LLM predictive distributions conditioned on data and free-form text. LLMs pick up on all kinds of subtle and unusual structure: 🧵
30
200
2K
We launched a research preview of Operator today! It's a model built on top of GPT-4o that can control a browser — it is very early and will make mistakes, but it's a taste of things to come https://t.co/eYQbIqI1Lw
22
20
205
Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks. It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task
202
2K
9K
An intuition for relative memory access times (scaled 10^10): Reg: 2 sec - Take from shelf Cache: 6½ min - Get from garage DDR Main: 20 min - Go to store DDR CXL: 1hr Far Mem: 8hr SSD: 6 days - Order online Spinning Disk (3ms): 1yr! Via @dylan522p & @SemiAnalysis_
0
2
24
I had the joy and the honor of being invited to give the @harveymudd commencement address this year. In the vector space of all advice, I explore a 5-dimension subspace orthogonal to the “follow your dreams” vector. YouTube Link: https://t.co/Aw2ZlgR2ql
49
111
1K
GPT-4o would not have happened without the vision, talent, conviction, and determination of @prafdhar over a long period of time. that (along with the work of many others) led to what i hope will turn out to be a revolution in how we use computers.
GPT-4o (o for “omni”) is the first model to come out of the omni team, OpenAI’s first natively fully multimodal model. This launch was a huge org-wide effort, but I’d like to give a shout out to a few of my awesome team members who made this magical model even possible!
305
544
8K