Jeremy Berman
@jerber888
Followers
6K
Following
2K
Media
10
Statuses
214
@humansand, prev post-training @reflection_ai, @ndea and co-founded https://t.co/aY50hNeJUD. yc w19.
SF / NYC
Joined August 2017
I finally reached human-level performance (85%) on ARC-AGI v1 for under $10k and within 12 hours. I use the same multi-agent collaboration with evolutionary test-time compute, now powered by GPT-5 pro with lower parallelism.
I'm back at the top of ARC-AGI with my new program. I use @grok 4 and multi-agent collaboration with evolutionary test-time compute
72
146
2K
The rate reduction in price per unit of intelligence has been thing I've most consistently underestimated the past couple of years. 300x in a year is nuts!
GPT-5.1 (Thinking High) is about 300 times cheaper per task than o3-preview (Low) while scoring only a few points lower on ARC-AGI-1. 1 year later intelligence has gotten 300 times cheaper. This is why I can’t stand people who say “wahh the models too expensive” it will become
700
558
6K
You can run this code yourself, it’s open source: https://t.co/QnGjsN98ji. And the kaggle notebook:
kaggle.com
Explore and run machine learning code with Kaggle Notebooks | Using data from ARC Prize 2025
2
2
76
GPT-5 pro is the best reasoning model today. It thinks coherently for hours and hours. My agent coordination logic is likely an intermediate step before the models learn to do this type of long horizon coordination on their own, end to end. It’s a hard problem but I wouldn’t bet
2
3
85
I ran it on a random sample of 100 tasks from the 2024 eval set. It got 88/100 and averaged $27 per task. The score still needs to be verified on the hidden set by the @arcprize, but in my past submissions, the hidden set subtracts a few percent from the score and adds a few
4
1
64
Never a dull moment talking language models with Alex. Really excited to get to do it more often
🎉 Next week, I am excited to join @reflection_ai as a Member of Technical Staff to help build the open intelligence ecosystem of the Western world. It's the most exciting opportunity to help software builders in our time, and will shape many years of AI Engineering in the
0
0
7
Teaching language models to have taste isn't just about making them better at writing or making jokes. Taste is everything. It's what leads to scientific discovery. In an infinite sea of things to reason about, taste is what guides you to reason about the right things — to
3
0
14
This was fun and Jack’s speech was great too
Recently had the opportunity to talk to CS grad students and faculty at Mizzou (University of Missouri) about our approach to ARC. Jeremy Berman spoke first about his public leaderboard SoTA approach, which was great. @jerber888 @arcprize
https://t.co/z5IbN8D5ou
0
0
9
we use our continuous human brains to build symbolic computer systems to build continuous computer brains to build symbolic computer systems
1
0
19
To discover new knowledge, language models must learn to be creative. This can’t be SFT’d. It’s a muscle that must be grown from on-policy reinforcement.
0
0
4
Creativity is having uncommon taste and the will to use it. Discovery comes from being creative and good at reasoning.
2
0
8
Solving hallucinations is a bigger deal than people think. Not just because solving it is useful, but because it demonstrates that we can train a model to overcome the instincts of pre-training
5
1
29
A model using off-policy trained weights to respond is like a human under a spell
0
0
1
Taste has been elusive for language models but I think we'll crack it
0
0
5
Scientific discovery comes from the ability to reason and taste for what to reason about
2
0
8
To unlock the secrets of the universe, we just have to make interesting, true, and unlikely tokens more likely
1
0
20
Special relativity only became in-distribution for Einstein once he had a large context priming it. With each new token, special relativity inched closer to distribution
0
0
3