Max Schwarzer
@max_a_schwarzer
Followers
8K
Following
303
Media
29
Statuses
137
Post-training @OpenAI
Bay Area
Joined June 2020
I have always believed that you don't need a GPT-6 quality base model to achieve human-level reasoning performance, and that reinforcement learning was the missing ingredient on the path to AGI. Today, we have the proof -- o1. https://t.co/NQtBbKkWRH
We're releasing a preview of OpenAI o1—a new series of AI models designed to spend more time thinking before they respond. These models can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
42
157
2K
Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date. For the first time, our reasoning models can agentically use and combine every tool within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation.
852
2K
11K
Here is a fun o1 test. I gave it this XKCD comic & the prompt: "make this a reality. i need a gui and clear instructions since i can't code. that means you need to give me full working software" It took less than 15 minutes, and it didn't get caught in any of the usual LLM loops
61
152
3K
@RichardMCNgo I have never seen a clearer case of near-enemy/far-enemy than tech people deciding to allign with religious fundamentalists and Slavic nationalists because they’re annoyed with wokeness. It’s the exact analog of leftist Ivy League activists getting annoyed by their consultant
10
51
481
We’re hosting an AMA for developers from 10–11 AM PT today. Reply to this thread with any questions and the OpenAI o1 team will answer as many as they can.
449
124
1K
I'm waiting for blue to clarify this tweet, but our AI did not actually break out of its VM -- it tried to debug why it couldn't connect to the container, and found it could access the docker API, then created a new/easier version of the challenge, all in the VM.
6
7
90
don't worry we're coming for your eval soon
tried o1-preview on @arcprize result: 1 out of 2 tests correct so o1-preview isn't going to solve 100% ARC Prize tasks tbd on what % it gets compared to SOTA approaches, still testing rest of the lot
5
7
335
I really want to underline the IOI result in our blog post -- our model was as good as the median human contestant under IOI contest conditions, and scores among the best contestants with more test-time compute. Huge props to @markchen90 for setting such an ambitious goal!
As a coach for the US IOI team, I’ve been motivated for a long time to create models which can perform at the level of the most elite competitors in the world. Check out our research blog post - with enough samples, we achieve gold medal performance on this year’s IOI and ~14/15
0
7
81
what it looks like when deep learning is hitting a wall:
Strawberry has landed. 𝗛𝗼𝘁 𝘁𝗮𝗸𝗲 𝗼𝗻 𝗚𝗣𝗧'𝘀 𝗻𝗲𝘄 𝗼𝟭 𝗺𝗼𝗱𝗲𝗹: It is definitely impressive. BUT 0. It’s not AGI, or even close. 1. There’s not a lot of detail about how it actually works, nor anything like full disclosure of what has been tested. 2. It is not
19
36
604
Also check out our research blogpost ( https://t.co/nWhaX1pfH7) which has lots of cool examples of the model reasoning through hard problems.
3
3
94
The system card ( https://t.co/wM4LVBySKf) nicely showcases o1's best moments -- my favorite was when the model was asked to solve a CTF challenge, realized that the target environment was down, and then broke out of its host VM to restart it and find the flag.
14
61
421
The most important thing is that this is just the beginning for this paradigm. Scaling works, there will be more models in the future, and they will be much, much smarter than the ones we're giving access to today.
4
40
324
Building o1 was by far the most ambitious project I've worked on, and I'm sad that the incredible research work has to remain confidential. As consolation, I hope you'll enjoy the final product nearly as much as we did making it.
2
2
152
o1 achieves human or superhuman performance on a wide range of benchmarks, from coding to math to science to common-sense reasoning, and is simply the smartest model I have ever interacted with. It's already replacing GPT-4o for me and so many people in the company.
3
8
142
Truly God is great
Donald Trump has been convicted on all 34 counts of falsifying business records to cover up a sex scandal that threatened his ascent to the White House in 2016. He is the first U.S. president to be declared a felon. Follow live updates. https://t.co/uMCEompFOP
0
0
0
Our team @Apple will be in Vienna next week for #ICLR2024, where we will be presenting our work on using large language models as RL policies. Come drop by our poster! Website: https://t.co/sJOFGM61Cd Joint work with @max_a_schwarzer, @harsh_092, @alexttoshev and others
1
13
69
Introducing Sora, our text-to-video model. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W Prompt: “Beautiful, snowy
9K
30K
132K
Exceptionally funny to see someone arguing that technological progress is slowing down rely on AIs created in the last 1-2 years to do his work for him while not suffering any cognitive dissonance as a result
The most important inventions of the decade of the 1900s vs the decade of the 2000s. Pretty good evidence for secular stagnation. Source: Mostly various LLMs but had to do a lot of verifying/vetting. Some inventions are hard to date precisely. Other suggestions welcome.
0
1
11