daniel_mac8 Profile Banner
Dan Mac Profile
Dan Mac

@daniel_mac8

Followers
15K
Following
44K
Media
3K
Statuses
26K

AI Engineer @sourcegraph | Writing Token Stream | Goodness, Truth and AI | Building at https://t.co/uHL1smAV1l

United States
Joined October 2013
Don't wanna be here? Send us removal request.
@daniel_mac8
Dan Mac
12 hours
GPT-5.2's GDPval result is *THE* story here. Hard to overstate. Released Sep. 25th, GDPval measures AI's ability to complete real-world knowledge work. Opus 4.1 was on top at 47.6% on release. Today, GPT-5.2 Pro and Thinking are at 74.1% and 70.9%. That means GPT-5.2 can do
@polynoamial
Noam Brown
14 hours
IMO GDPVal is the most important result from our @OpenAI GPT-5.2 launch. We outperform in-domain experts and are SOTA among *all* models on GPDVal, which measures performance on self-contained tasks like making spreadsheets and powerpoint presentations. Really impressive outputs!
3
3
117
@daniel_mac8
Dan Mac
4 hours
not confident the idea of "thing X is simply impossible" is long for this world...
@Tim_Dettmers
Tim Dettmers
2 days
My new blog post discusses the physical reality of computation and why this means we will not see AGI or any meaningful superintelligence:
0
1
1
@daniel_mac8
Dan Mac
5 hours
GPT-5.2 Thinking writes a haiku where the second letters spell "Buddha" and last letters spell "Dharma" when combined.
0
0
2
@daniel_mac8
Dan Mac
17 hours
GDPval is the most important LLM evaluation. It tests models on *real-world* knowledge work tasks. Artificial Analysis recreated it with their AI Agent harness "Stirrup", and now it can be run on every new model. Opus 4.5 is tops. No surprise there. Lllama 4 Maverick is by far
@ArtificialAnlys
Artificial Analysis
2 days
Announcing GDPval-AA — our leaderboard and evaluation harness for comparing models on OpenAI’s GDPval dataset of real-world knowledge work tasks Earlier today, we announced our agentic harness called Stirrup, which we built to run GDPval tasks on any language model. We’re
1
3
68
@daniel_mac8
Dan Mac
8 hours
cool wall bro. mind if we fly right over it?
2
0
18
@deredleritt3r
prinz
9 hours
Shane Legg, Google: "I do not think there are any fundamental obstacles [to visual reasoning and continual learning], and we have ideas about how to build systems that can do these things." Also, 50% chance of achieving "mini-AGI" by 2028 - i.e., an AI that can carry out all
@Hangsiin
NomoreID
14 hours
Shane Legg(@ShaneLegg), the cofounder of Google DeepMind, discussed the definition of AGI, its timeline, and its socio economic impact. 1. -Today’s AI has made a lot of progress compared with five years ago, but it still has many weaknesses and is uneven. -Among its weaknesses,
2
9
170
@daniel_mac8
Dan Mac
10 hours
GPT-5.2 represents a breakthrough in long context reasoning from OpenAI. Gemini 3 was the previous leader on the 4 needle version of this benchmark at 84.7%. GPT-5.2 is pinned near 100% up to 128k tokens. Surely this unlocks new capabilities?
9
4
92
@daniel_mac8
Dan Mac
14 hours
GPT-5.2 Pro (High) is even with Gemini 3 Deep Think on ARC-AGI-2 at ~54% and 1/2 the cost at $15 vs. $30 per task. Also, on ARC-AGI-1, GPT-5.2 Pro (X-High) scores higher than o3-preview did last year. 90.5% vs. 88%. The shocking part? It's *390x* more cost efficient (!).
@arcprize
ARC Prize
14 hours
We also verified that GPT-5.2 Pro (High) is SOTA for ARC-AGI-2, scoring 54.2% for $15.72/task (Due to API timeouts, we were unable to reliably verify GPT 5.2 Pro X-High on ARC-AGI-2) All verified GPT-5.2 family scores: https://t.co/x8U1ItOjGR
3
8
99
@daniel_mac8
Dan Mac
14 hours
Go to this URL to check it out:
Tweet card summary image
platform.openai.com
0
0
0
@daniel_mac8
Dan Mac
14 hours
GPT-5.2 is available NOW on the API. You can also use it via the OpenAI API Platform playground. Also listed is GPT-5.2 Pro so looks like Pro subs are also getting a new model today. Very much looking forward to GPT-5.2 Pro.
3
2
33
@daniel_mac8
Dan Mac
15 hours
Do you smell that? It's coming from the kitchen... Is that...garlic? 🧄 Soon™
3
1
41
@daniel_mac8
Dan Mac
17 hours
GDPval is the most important LLM evaluation. It tests models on *real-world* knowledge work tasks. Artificial Analysis recreated it with their AI Agent harness "Stirrup", and now it can be run on every new model. Opus 4.5 is tops. No surprise there. Lllama 4 Maverick is by far
@ArtificialAnlys
Artificial Analysis
2 days
Announcing GDPval-AA — our leaderboard and evaluation harness for comparing models on OpenAI’s GDPval dataset of real-world knowledge work tasks Earlier today, we announced our agentic harness called Stirrup, which we built to run GDPval tasks on any language model. We’re
1
3
68
@daniel_mac8
Dan Mac
18 hours
GPT-5.2 day has arrived. Happy Garlic 🧄 day to all who celebrate. What to look for: > Does 5.2 top Opus 4.5 on SWE-Bench (80.9%)? > Does 5.2 top Gemini 3 DT on ARC-AGI-2 (45.1%)? > Does 5.2 include reasoning techniques from IMO? Pivotal day for OpenAI.
@daniel_mac8
Dan Mac
1 day
GPT-5.2 release is existential for OpenAI. It’s need to be clearly better than Gemini 3 Pro and Opus 4.5. Otherwise OpenAI might be cooked for real this time.
12
4
125
@daniel_mac8
Dan Mac
2 days
Gavin Baker, Managing Director of Atreides Management, explains how reasoning saved AI progress on @InvestLikeBest. We have yet to see a model trained on Nvidia Blackwell. The massive gains of the past 18 months are due to TTC. There is no wall.
4
2
19
@daniel_mac8
Dan Mac
1 day
GPT-5.2 release is existential for OpenAI. It’s need to be clearly better than Gemini 3 Pro and Opus 4.5. Otherwise OpenAI might be cooked for real this time.
25
11
277
@daniel_mac8
Dan Mac
2 days
Ours is a weird moment, for me precipitated by using Opus 4.5 + Claude Code. It is so blatantly obvious how far the models + harness have come in the past three months, let alone a year. iykyk, and you are leagues ahead of those who don't. Which is why all must know.
@mckaywrigley
Mckay Wrigley
2 days
The more I code with Opus 4.5, the more I think we’re 6-12mo away from solving software. The model is pretty much there. I’ll build like 3 versions of an app in a few hours just to explore options that each would’ve taken me 1-2 weeks <1 year ago. It’s getting weird.
25
27
643
@daniel_mac8
Dan Mac
1 day
GPT-5.2 release is existential for OpenAI. It’s need to be clearly better than Gemini 3 Pro and Opus 4.5. Otherwise OpenAI might be cooked for real this time.
25
11
277
@daniel_mac8
Dan Mac
1 day
This is not a “Twitter Account”. This is a real man’s actual thoughts.
3
0
15
@inductionheads
Super Dario
1 day
The reason Opus 4.5 + Claude Code feels qualitatively different is that they’ve trained it to use memory as a tool In this case it’s context window (short term memory) and file system (long term memory) New architectures help but the key trick to continuous learning is agency
15
18
461