Dan Mac @daniel_mac8 X Profile

Dan Mac

@daniel_mac8

Followers

15K

Following

44K

Media

3K

Statuses

26K

AI Engineer @sourcegraph | Writing Token Stream | Goodness, Truth and AI | Building at https://t.co/uHL1smAV1l

https://t.co/tOAmGrtQBG

United States

Joined October 2013

Don't wanna be here? Send us removal request.

Dan Mac

@daniel_mac8

12 hours

GPT-5.2's GDPval result is *THE* story here. Hard to overstate. Released Sep. 25th, GDPval measures AI's ability to complete real-world knowledge work. Opus 4.1 was on top at 47.6% on release. Today, GPT-5.2 Pro and Thinking are at 74.1% and 70.9%. That means GPT-5.2 can do

Noam Brown

@polynoamial

14 hours

IMO GDPVal is the most important result from our @OpenAI GPT-5.2 launch. We outperform in-domain experts and are SOTA among *all* models on GPDVal, which measures performance on self-contained tasks like making spreadsheets and powerpoint presentations. Really impressive outputs!

3

117

Dan Mac

@daniel_mac8

4 hours

not confident the idea of "thing X is simply impossible" is long for this world...

Tim Dettmers

@Tim_Dettmers

2 days

My new blog post discusses the physical reality of computation and why this means we will not see AGI or any meaningful superintelligence:

0

1

Dan Mac

@daniel_mac8

5 hours

GPT-5.2 Thinking writes a haiku where the second letters spell "Buddha" and last letters spell "Dharma" when combined.

0

2

Dan Mac

@daniel_mac8

17 hours

GDPval is the most important LLM evaluation. It tests models on *real-world* knowledge work tasks. Artificial Analysis recreated it with their AI Agent harness "Stirrup", and now it can be run on every new model. Opus 4.5 is tops. No surprise there. Lllama 4 Maverick is by far

Artificial Analysis

@ArtificialAnlys

2 days

Announcing GDPval-AA — our leaderboard and evaluation harness for comparing models on OpenAI’s GDPval dataset of real-world knowledge work tasks Earlier today, we announced our agentic harness called Stirrup, which we built to run GDPval tasks on any language model. We’re

1

3

68

Dan Mac

@daniel_mac8

8 hours

cool wall bro. mind if we fly right over it?

2

0

18

prinz

@deredleritt3r

9 hours

Shane Legg, Google: "I do not think there are any fundamental obstacles [to visual reasoning and continual learning], and we have ideas about how to build systems that can do these things." Also, 50% chance of achieving "mini-AGI" by 2028 - i.e., an AI that can carry out all

NomoreID

@Hangsiin

14 hours

Shane Legg(@ShaneLegg), the cofounder of Google DeepMind, discussed the definition of AGI, its timeline, and its socio economic impact. 1. -Today’s AI has made a lot of progress compared with five years ago, but it still has many weaknesses and is uneven. -Among its weaknesses,

2

9

170

Dan Mac

@daniel_mac8

10 hours

GPT-5.2 represents a breakthrough in long context reasoning from OpenAI. Gemini 3 was the previous leader on the 4 needle version of this benchmark at 84.7%. GPT-5.2 is pinned near 100% up to 128k tokens. Surely this unlocks new capabilities?

9

4

92

Dan Mac

@daniel_mac8

14 hours

GPT-5.2 Pro (High) is even with Gemini 3 Deep Think on ARC-AGI-2 at ~54% and 1/2 the cost at $15 vs. $30 per task. Also, on ARC-AGI-1, GPT-5.2 Pro (X-High) scores higher than o3-preview did last year. 90.5% vs. 88%. The shocking part? It's *390x* more cost efficient (!).

ARC Prize

@arcprize

14 hours

We also verified that GPT-5.2 Pro (High) is SOTA for ARC-AGI-2, scoring 54.2% for $15.72/task (Due to API timeouts, we were unable to reliably verify GPT 5.2 Pro X-High on ARC-AGI-2) All verified GPT-5.2 family scores: https://t.co/x8U1ItOjGR

3

8

99

Dan Mac

@daniel_mac8

14 hours

Go to this URL to check it out:

platform.openai.com

0

Dan Mac

@daniel_mac8

14 hours

GPT-5.2 is available NOW on the API. You can also use it via the OpenAI API Platform playground. Also listed is GPT-5.2 Pro so looks like Pro subs are also getting a new model today. Very much looking forward to GPT-5.2 Pro.

3

2

33

Dan Mac

@daniel_mac8

15 hours

Do you smell that? It's coming from the kitchen... Is that...garlic? 🧄 Soon™

ChatGPT

@ChatGPTapp

1 day

https://t.co/3VBSCzpgxL

3

1

41

Dan Mac

@daniel_mac8

17 hours

GDPval is the most important LLM evaluation. It tests models on *real-world* knowledge work tasks. Artificial Analysis recreated it with their AI Agent harness "Stirrup", and now it can be run on every new model. Opus 4.5 is tops. No surprise there. Lllama 4 Maverick is by far

Artificial Analysis

@ArtificialAnlys

2 days

Announcing GDPval-AA — our leaderboard and evaluation harness for comparing models on OpenAI’s GDPval dataset of real-world knowledge work tasks Earlier today, we announced our agentic harness called Stirrup, which we built to run GDPval tasks on any language model. We’re

1

3

68

Dan Mac

@daniel_mac8

18 hours

GPT-5.2 day has arrived. Happy Garlic 🧄 day to all who celebrate. What to look for: > Does 5.2 top Opus 4.5 on SWE-Bench (80.9%)? > Does 5.2 top Gemini 3 DT on ARC-AGI-2 (45.1%)? > Does 5.2 include reasoning techniques from IMO? Pivotal day for OpenAI.

Dan Mac

@daniel_mac8

1 day

GPT-5.2 release is existential for OpenAI. It’s need to be clearly better than Gemini 3 Pro and Opus 4.5. Otherwise OpenAI might be cooked for real this time.

12

4

125

Dan Mac

@daniel_mac8

2 days

Gavin Baker, Managing Director of Atreides Management, explains how reasoning saved AI progress on @InvestLikeBest. We have yet to see a model trained on Nvidia Blackwell. The massive gains of the past 18 months are due to TTC. There is no wall.

4

2

19

Dan Mac

@daniel_mac8

1 day

GPT-5.2 release is existential for OpenAI. It’s need to be clearly better than Gemini 3 Pro and Opus 4.5. Otherwise OpenAI might be cooked for real this time.

ChatGPT

@ChatGPTapp

1 day

https://t.co/3VBSCzpgxL

25

11

277

Dan Mac

@daniel_mac8

2 days

Ours is a weird moment, for me precipitated by using Opus 4.5 + Claude Code. It is so blatantly obvious how far the models + harness have come in the past three months, let alone a year. iykyk, and you are leagues ahead of those who don't. Which is why all must know.

Mckay Wrigley

@mckaywrigley

2 days

The more I code with Opus 4.5, the more I think we’re 6-12mo away from solving software. The model is pretty much there. I’ll build like 3 versions of an app in a few hours just to explore options that each would’ve taken me 1-2 weeks <1 year ago. It’s getting weird.

25

27

643

Dan Mac

@daniel_mac8

1 day

GPT-5.2 release is existential for OpenAI. It’s need to be clearly better than Gemini 3 Pro and Opus 4.5. Otherwise OpenAI might be cooked for real this time.

ChatGPT

@ChatGPTapp

1 day

https://t.co/3VBSCzpgxL

25

11

277

Dan Mac

@daniel_mac8

1 day

This is not a “Twitter Account”. This is a real man’s actual thoughts.

3

0

15

Super Dario

@inductionheads

1 day

The reason Opus 4.5 + Claude Code feels qualitatively different is that they’ve trained it to use memory as a tool In this case it’s context window (short term memory) and file system (long term memory) New architectures help but the key trick to continuous learning is agency

15

18

461