
Noam Brown
@polynoamial
Followers
83K
Following
6K
Media
125
Statuses
1K
Researching reasoning @OpenAI | Co-created Libratus/Pluribus superhuman poker AIs, CICERO Diplomacy AI, and OpenAI o3 / o1 / 🍓 reasoning models
San Francisco, CA
Joined January 2017
Today, I’m excited to share with you all the fruit of our effort at @OpenAI to create AI models capable of truly general reasoning: OpenAI's new o1 model series! (aka 🍓) Let me explain 🧵 1/
222
2K
11K
RT @chaidiscovery: We’re excited to introduce Chai-2, a major breakthrough in molecular design. Chai-2 enables zero-shot antibody discover….
0
405
0
You don’t need a PhD to be a great AI researcher. Even @OpenAI’s Chief Research Officer doesn’t have a PhD.
202
216
4K
It's both surprising and worrisome that broad misalignment emerges simply from training models on insecure code. Great to see @OpenAI publishing research investigating how this happens and how to mitigate it!.
We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more. We find that emergent misalignment:.- happens during reinforcement learning.- is controlled by “misaligned persona” features.- can be detected and mitigated. 🧵:
9
21
326
I'm fortunate to be able to devote my career to researching AI and building reasoning models like o3 for the world to use. If you want to join us in pushing forward the intelligence frontier, we're hiring at @OpenAI.
46
48
1K
For now, you can use poker to vibe check the models because you can quickly see how many major blunders they make. But, like Rock-Paper-Scissors, I do think they will get better with time.
o3-mini is the first LLM released that consistently gets this tic-tac-toe question correct. The summarized CoT is pretty unhinged but you can see on the right that by the end it figures it out.
1
1
64
To @NateSilver538's main point: I agree o3 sucks at poker. Unfortunately poker isn't a great eval for LLMs because the variance is huge. Good humans need to play ~100,000 hands against each other to say with confidence who is better. That's way too expensive for reasoning models.
6
4
87
For the record, poker solvers like @GTOWizard absolutely do use machine learning. They're based on ReBeL, which while not directly useful for LLMs remains my favorite paper I've ever written.
2
4
109
There’s an old joke in AI: as soon as machines outperform humans at something, it stops being considered AI. Glad to see poker solvers have reached that point.
ChatGPT totally sucks at poker. It knows it sucks if you ask it. Today's newsletter is a deeper dive into why, with some speculation about what this means for AI capabilities.
18
38
914
RT @michpokrass: gpt-4.1 landing in chatgpt today!! we were initially planning on keeping this model api only but you all wanted it in chat….
0
33
0
People often ask me: will reasoning models ever move beyond easily verifiable tasks? I tell them we already have empirical proof that they can, and we released a product around it: @OpenAI Deep Research.
48
55
1K
"Find questions that are so hard that even if the models improve 3x they'll still get zero.".
I have a post where I talk about how to build good LM benchmarks. I've had to edit the part where I talk about how I think you should try to make your benchmark hard, multiple times now, since LM abilities are accelerating so rapidly.
20
15
323