
Jason Wei
@_jasonwei
Followers
84K
Following
8K
Media
128
Statuses
1K
Super excited to finally share what I have been working on at OpenAI!. o1 is a model that thinks before giving the final answer. In my own words, here are the biggest updates to the field of AI (see the blog post for more details):. 1. Don’t do chain of thought purely via.
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
94
353
3K
Yesterday I gave a lecture at @Stanford's CS25 class on Transformers! . The lecture was on how “emergent abilities” are unlocked by scaling up language models. Emergence is one of the most exciting phenomena in large LMs…. Slides:
28
309
2K
o3 is very performant. More importantly, progress from o1 to o3 was only three months, which shows how fast progress will be in the new paradigm of RL on chain of thought to scale inference compute. Way faster than pretraining paradigm of new model every 1-2 years.
We announced @OpenAI o1 just 3 months ago. Today, we announced o3. We have every reason to believe this trajectory will continue.
58
209
2K
o1-mini is the most surprising research result i've seen in the past year. obviously i cannot spill the secret, but a small model getting >60% on AIME math competition is so good that it's hard to believe. congrats @ren_hongyu @shengjia_zhao for the great work!.
33
90
2K
Today I am pleased to announce the new board of directors for my relationship. The new board of directors will be:.1. My mom.2. My girlfriend’s sister.3. @hwchung27, who I pair program with frequently.4. Bret Taylor (we’ve only met once, but every board should have Bret Taylor).
41
24
994
Pair programming isn’t standard at most companies and basically non-existent in academia, but I’ve been doing it with @hwchung27 for almost a year now. While it naively seems slower to code individually, I’ve realized that there are many benefits:. (1) In AI, what you work on can.
36
111
960
As a kid I loved whiteboard lectures way more than slides, so for Stanford’s CS25 class I gave a whiteboard lecture!. My goal was to simply and clearly explain why language models work so well, purely via intuitions. Youtube video: (w/ @hwchung27).
7
98
654
Very excited for o1 to come out from preview! . Main takeaways:.- o1 thinks harder and is more performant on tough problems.- o1 thinks faster on easy problems.- o1 reasons over image and text.- o1 can think *even* harder when using o1-pro mode. o1 was indeed a saga, with many.
OpenAI o1 is now out of preview in ChatGPT. What’s changed since the preview? A faster, more powerful reasoning model that’s better at coding, math & writing. o1 now also supports image uploads, allowing it to apply reasoning to visuals for more detailed & useful responses.
34
57
634
I gave an invited lecture at New York University for @hhexiy's class!. I covered three ideas driving the LLM revolution: scaling, emergence, and reasoning. I tried to frame them in a way that reveals why large LMs are special in the history of AI. Slides:
9
152
567
Had a bit of a fanboy moment today meeting @bryan_johnson, who has been super inspirational to me in prioritizing my health. I asked him about the best way to balance career and spending time on health. His advice is that while many people give up sleep to work more, sleeping
16
33
543
"So do you speak Chinese?". Normal person: "I can understand but I don't speak well". @YiTayML: "my encoder is OK but my decoder is broken" 🤦.
6
20
513
IMO GPT-4 is a bigger leap than GPT-3 was. - GPT-3 advanced AI from task-specific models to a single prompted model that is task-general.- GPT-4 is human-level on many hard tasks, and will signal a *societal* revolution where AI reaches every industry, starting with technology 🧵.
Announcing GPT-4, a large multimodal model, with our best-ever results on capabilities and alignment:
9
78
457
Meta's open source OPT-175 is comparable to GPT-3 175B. This is a massive step forward for bringing big LM research to academia. Expected nothing less from @LukeZettlemoyer and crew :).
9
75
437
Andrej’s tweet is the right way to think about it right now but I totally believe that in one or two years we will start relying on AI for very challenging decisions like diagnosing disease under limited information. Key thing to note here is that big decisions can be viewed as a.
People have too inflated sense of what it means to "ask an AI" about something. The AI are language models trained basically by imitation on data from human labelers. Instead of the mysticism of "asking an AI", think of it more as "asking the average data labeler" on the.
25
50
426
Throughout my time at OpenAI I have found Hongyu Ren to be absolutely ruthless. No mercy for evaluation benchmarks whatsoever: o3-mini is 83% on AIME, 2000+ codeforces elo. Every o*-mini model is so performant and fast. Congrats @ren_hongyu @shengjia_zhao and crew!.
o3-mini is here! Together with @shengjia_zhao, @_kevinlu, @max_a_schwarzer, @ericmitchellai, @brian_zq, @sandersted and many others, we trained this efficient reasoning model, maximally compressing the intelligence from big brothers o1 / o3. The model is very good in hard
7
21
392
Three facts about the new UL2 model:. 1. Checkpoint is public. 2 Beats GPT-3 on superglue for zero-shot. 3. Smallest model I know of that can do chain-of-thought reasoning on arbitrary tasks. 📜 Open source for the win!.
Introducing UL2, a novel language pre-training paradigm that improves performance of language models across datasets and setups by using a mixture of training objectives, each with different configurations. Read more and grab model checkpoints at
7
66
378
Very clever paper to predict downstream performance of pre-trained models. - Take advantage of the fact that larger models being more sample-efficient and performant is equivalent to finetuning a smaller model on targeted data.- In principle this is hugely valuable because you.
Can we predict emergent capabilities in GPT-N+1🌌 using only GPT-N model checkpoints, which have random performance on the task?. We propose a method for doing exactly this in our paper “Predicting Emergent Capabilities by Finetuning”🧵
8
48
369
22 minute video of OpenAI researchers talking about their experiences working on strawberry.@woj_zaremba is very fun to work with.@MillionInt is a legend.
2
26
360