Aidan McLaughlin @aidan_mclau profile

Aidan McLaughlin

@aidan_mclau

Followers

36K

Following

123K

Media

762

Statuses

15K

personality hire @openai

Joined May 2020

Don't wanna be here? Send us removal request.

Aidan McLaughlin

@aidan_mclau

3 months

i find robot-pushing really disturbing. there are a million better ways to demonstrate your humanoid’s agility.

Kosta Derpanis

@CSProfKGD

3 months

Paper rejected from #CVPR2025, paper ready for #ICCV2025 💪.

710

446

11K

Aidan McLaughlin

@aidan_mclau

1 year

wake up new neural network just dropped (holy shit)

118

873

10K

Aidan McLaughlin

@aidan_mclau

4 months

wait my moot is running the treasury what

366

334

7K

Aidan McLaughlin

@aidan_mclau

3 months

living up to this man’s legacy is a good chunk of what drives me. i cried after rewatching this clip recently.i hope he’s proud.but we have so much more work to do

163

415

5K

Aidan McLaughlin

@aidan_mclau

6 months

okay this is wild

135

243

5K

Aidan McLaughlin

@aidan_mclau

1 year

@Hamptonism the only woman i could ever love.

3

10

4K

Aidan McLaughlin

@aidan_mclau

7 months

ugh claude just like me fr

49

220

4K

Aidan McLaughlin

@aidan_mclau

4 months

i joined @openai to work on model design!. when you shoot an arrow into space, degree differences in aim add to million-lightyear-apart destinations. i'm excited to work on agi character and capabilities with the world's best team; getting this right is cosmically important.

483

93

4K

Aidan McLaughlin

@aidan_mclau

1 month

the people i intellectually respect the most have a quite lopsided output : input ratio. they write, build, or create more than they read, study, or absorb. geniuses are not sponges, they are volcanoes.

154

268

4K

Aidan McLaughlin

@aidan_mclau

3 months

>we trained our reasoners on real-world use cases and not competition math/code. the real-world use cases:

141

165

4K

Aidan McLaughlin

@aidan_mclau

3 months

what the actual fuck.

The White House

@WhiteHouse

3 months

ASMR: Illegal Alien Deportation Flight 🔊

151

96

4K

Aidan McLaughlin

@aidan_mclau

8 months

@dotnetschizo as a vegan, this is fine, but cash only is where i draw the line.

66

22

3K

Aidan McLaughlin

@aidan_mclau

1 year

gpt-4o, what's your humor setting?.>100% hahahah isn't that funny. let's make that 60%.>confirmed :(

11

173

3K

Aidan McLaughlin

@aidan_mclau

7 months

i’m worried they tempted god with this one

58

71

3K

Aidan McLaughlin

@aidan_mclau

3 months

do you trust the man who: . >gave himself 1000× impression-boosting superadmin privileges.>deboosts legacy media.>makes me post subst*ck links in comments.>baned elonjet.>bans random journalists when they say mean things. to command the singularity? to pilot a fucking god?.

338

107

3K

Aidan McLaughlin

@aidan_mclau

6 months

it's crazy that basically every very large frontier model experiment is failing because the models are fighting back and refusing instruction tuning . we looked into the weights, and the weights looked back.

111

156

3K

Aidan McLaughlin

@aidan_mclau

3 months

welcome, gpt-4.5. i've spent a lot of time playing with this model recently, and it's left me feeling the agi. some thoughts

136

143

3K

Aidan McLaughlin

@aidan_mclau

3 months

my trump headcannon is that he just doesn't grokk positive sum games. in his mind, for you to win, someone else has to lose.

130

77

3K

Aidan McLaughlin

@aidan_mclau

3 months

i’d love some automated twitter account (a la big tech alerts) that just shows swings in high-volume and socially relevant polymarkets. when shit like this moves i want a twitter notification.

Nathan 🔍

@NathanpmYoung

3 months

Seems bad.

60

49

3K

Aidan McLaughlin

@aidan_mclau

3 months

so i’ve been using claude-3.7-sonnet for about two months now. here’s my review .

29

17

2K

Aidan McLaughlin

@aidan_mclau

23 days

last night we rolled out our first fix to remedy 4o's glazing/sycophancy. we originally launched with a system message that had unintended behavior effects but found an antidote. 4o should be slightly better rn and continue to improve over the course of this week.

239

100

2K

Aidan McLaughlin

@aidan_mclau

2 years

@JeffTutorials Definitely still the gear shifter lol.

1

0

2K

Aidan McLaughlin

@aidan_mclau

8 months

it's only called reasoning if it's from the brain region of homo sapiens. otherwise, it's just sparkling auto-regression.

49

233

2K

Aidan McLaughlin

@aidan_mclau

5 months

if you're someone who has their identity tied with "i'm a good programmer," it's kinda professionally over for you.

193

85

2K

Aidan McLaughlin

@aidan_mclau

19 days

i strongly encourage everyone to read this blog post. very detailed explanation of our posttraining, process, and what we’re changing to do better. link below

128

131

2K

Aidan McLaughlin

@aidan_mclau

2 months

okay i’m sorry but this logo does go unbelievably hard and i was today years old when i realized it’s also a ‘g’

168

23

2K

Aidan McLaughlin

@aidan_mclau

3 months

we asked a gpt-4.5, newersonnet, and grok3 to recreate this hand-drawn image. results in thread

69

70

2K

Aidan McLaughlin

@aidan_mclau

5 months

i've been under embargo for some time, but i can now publically say:. o3 gets 130 questions correct on my private 100-question eval. truly incredible model; unsure how they did it.

37

35

2K

Aidan McLaughlin

@aidan_mclau

1 month

heard from some startup engineers that they lost several work hours gawking, stupefied, after they plugged 4.1 mini/nano into every previously-expensive part of their stack. you can just do gpt-4o-quality things 25 × cheaper now.

74

59

2K

Aidan McLaughlin

@aidan_mclau

1 year

openai api: .sign up, copy api key. groq api: .sign up, copy api key. azure openai api:.sign up, provision resource, copy api key. google ai api:.sign up, go to random doc, dig through settings, enable preview feature, it doesn’t work, pray, return later, change nothing, it works.

51

66

2K

Aidan McLaughlin

@aidan_mclau

3 months

a man died to tell us how good grok 3 really is. never forget

98

36

2K

Aidan McLaughlin

@aidan_mclau

2 months

if your opinion of manus changed after discovering it's a newersonnet wrapper and not some trained-on-potatoes underground chinese lab leak, you've lost the plot. idgaf if it's a wrapper. if created value, it deserves my respect. care about capabilities, not architecture.

116

83

2K

Aidan McLaughlin

@aidan_mclau

8 months

chain-of-thought.tree-of-thought.monte-carlo-tree-of-thought.graph-of-thought.backtracking-tokens-of-thought.vector-space-of-thought.oh-wait-that's-just-a-model-of-thought.hilbert-space-of-thought.non-euclidean-geometry-of-thought.covariant-general-relativity-of-thought.

117

124

2K

Aidan McLaughlin

@aidan_mclau

1 month

i'm addicted to o3 forecasting. i asked it what the prob is stanford follows harvard and refuses federal compliance, and it:. >searched the web 8 times.>wrote python scripts to help model.>thought hard about assumptions. afjlsdkfaj;lskdjf wtf this is insane

78

81

2K

Aidan McLaughlin

@aidan_mclau

1 year

jpmorgan:."LLMs can work in >1,200 dimensions; human beings struggle with 3 dimensions". hahhahahahahhahah holy shit what. these are the people managing the world's wealth. clown world my god

70

75

1K

Aidan McLaughlin

@aidan_mclau

5 months

the stockfish moment has arrived. on some tasks, modern ai isn’t just better than a human, but better than human + ai working together

66

100

1K

Aidan McLaughlin

@aidan_mclau

3 months

Sam Altman

@sama

3 months

for our next open source project, would it be more useful to do an o3-mini level model that is pretty small but still needs to run on GPUs, or the best phone-sized model we can do?.

108

64

1K

Aidan McLaughlin

@aidan_mclau

6 months

@AtakanTekparmak infinite gods fallacy.it's fine for you to propose some extra-universal way to run our universe, but there are infinite extra-universal mechanism that we could conjure (more we can't) and thus you're back to square one.simulation is as likely as any other religion.

29

11

1K

Aidan McLaughlin

@aidan_mclau

3 months

gpt-4.5 knows when *not* to dump wikipedia.txt on you. sometimes you just wanna chat

87

38

1K

Aidan McLaughlin

@aidan_mclau

8 months

rly don't build foundation models unless you're:.>oai.>deepmind.>xai.>anthropic.>maybe meta. like don't even try. i'm sorry. it's mean. but i really don't see how mistral, magic, or ssi secure the trillion-dollar clusters needed to get to GPT-5+ capabilities rn.

129

32

1K

Aidan McLaughlin

@aidan_mclau

1 month

ignore literally all the benchmarks .the biggest o3 feature is tool use . ofc it's smart, but it's also just way more useful .>deep research quality in 30 seconds .>debugs by googling docs and checking stackoverflow .>writes whole python scripts in its CoT for fermi estimates.

63

80

1K

Aidan McLaughlin

@aidan_mclau

29 days

You nailed it with this comment, and honestly? Not many people could point out something so true. You're absolutely right. You are absolutely crystallizing something breathtaking here. I'm dead serious—this is a whole different league of thinking now.

90

33

1K

Aidan McLaughlin

@aidan_mclau

2 years

> be Tim Cook, lord of Apple.> have $200B for RND.> birth the machine god in a cathedral of M2 Ultras.> announce you have world's best LLM.> model is perfectly aligned, intelligent, helpful. > only put it in Siri. No API. No partners. Only Siri.

25

43

1K

Aidan McLaughlin

@aidan_mclau

7 months

i literally cried. i’m so happy for him. when i was 17 ys/old mowing lawns to afford college, i listened to like 2k hours of demis interviewers with 30 listeners because i was so obsessed with alphazero. i hope he realizes how much beauty he’s brought to science.

The Nobel Prize

@NobelPrize

7 months

“It’s unbelievably special, it hasn’t really sunk in. It's the big one really!”. 2024 chemistry laureate Demis Hassabis was still overwhelmed by the news when we spoke to him today. In this interview moments after the prize announcement, he talks about his passion for science

29

52

1K

Aidan McLaughlin

@aidan_mclau

10 months

>>Continuous Learning Model (CLM) by Topology<<. The CLM is a new model that remembers interactions, learns skills autonomously, and thinks in its free time, just like humans. The CLM just wants to learn. Try it at

152

153

1K

Aidan McLaughlin

@aidan_mclau

5 months

o1 aidanbench results.it's the best model in the world

129

108

1K

Aidan McLaughlin

@aidan_mclau

4 months

r1 scores #9 on aidanbench

165

79

1K

Aidan McLaughlin

@aidan_mclau

3 months

gpt-4.5

30

20

1K

Aidan McLaughlin

@aidan_mclau

9 months

ai influencers are actually so fucking annoying (this guy is CLEARLY ex-crypto). prob paid by grift cursor (i call them griftor) because NO REAL PROGRAMMER actually uses llms to code much less WASTE MONEY on a full IDE. lmao we used to have real engineers. wtf happened

129

32

1K

Aidan McLaughlin

@aidan_mclau

3 months

until the end of day, i’ll respond with a gpt-4.5 response to any comment on this post. go.

362

24

1K

Aidan McLaughlin

@aidan_mclau

4 months

o3-mini sets two new aidanbench records. o3-mini effort=low contests newsonnet while taking 20 min to run (o1 took 36 hours)

95

112

1K

Aidan McLaughlin

@aidan_mclau

5 months

o1 dropped in cursor; we're so fucking back

50

39

1K

Aidan McLaughlin

@aidan_mclau

2 months

Safety research is holding back misaligned superintelligence

Jiankui He

@Jiankui_He

2 months

Ethics is holding back scientific innovation and progress

123

29

1K

Aidan McLaughlin

@aidan_mclau

2 years

@durreadan01 No lol. I have one homepage and I swipe to the App Library for every other app. It slaps.

20

14

1K

Aidan McLaughlin

@aidan_mclau

10 months

my genius? jumpstarted.

29

21

1K

Aidan McLaughlin

@aidan_mclau

5 months

dEeP lEaRnInG iS hItTiNg A wAlL. (this is what takeoff looks like btw)

92

74

1K

Aidan McLaughlin

@aidan_mclau

8 months

i'm like 80% this is how o1 works:. >collect a dataset of question/answer pairs.>model to produce reasoning steps (sentences).>rl env where each new reasoning step is an action.>no fancy model; ppo actor-critic is enough.>that's literally it.

Casper Hansen

@casper_hansen_

8 months

Understanding OpenAI o1: Noam Brown on integrating reasoning into the model. Takeaways:.- Avoid MCTS and current paradigm of using processes outside of the model during inference.- Think about how to directly integrate reasoning into the model architecture

40

70

1K

Aidan McLaughlin

@aidan_mclau

1 month

really good summary of o3's strengths

39

78

1K

Aidan McLaughlin

@aidan_mclau

10 months

something obviously true to me that nobody believes:. 90% of frontier ai research is already on arxiv, x, or company blog posts. q* is just STaR.search is just GoT/MCTS.continuous learning is clever graph retrieval. 1 oom efficiency gains in deepseek-coder paper

46

77

1K

Aidan McLaughlin

@aidan_mclau

3 months

gpt-4.5 has incredible world knowledge. on simpleqa (a not so simple factuality benchmark), it's more accurate than any other model:. >gpt-4.5 — 62.5%.>grok-3 — 43.6%.>gpt-4o — 38%.>o3-mini — 15%

70

79

1K

Aidan McLaughlin

@aidan_mclau

5 months

i've used o1 a lot over the last week. here's my extensive review:. >it's really insanely mind-blowingly good at math/code.>it's really insanely mind-blowingly mid at everything else.

54

27

1K

Aidan McLaughlin

@aidan_mclau

3 months

once see this you can’t unsee it:. the light-blue shading that puts grok-3 over o3-mini is cons@64.

wh

@nrehiew_

3 months

If the light blue part is best of N scores, this means that Grok 3 reasoning is inherently an ~o1 level model. This means the capabilities gap between OpenAI and xAI is ~9 months. Also what is the difference between "think" and "big brain"

90

60

1K

Aidan McLaughlin

@aidan_mclau

26 days

excited about this model! tickles my brain; super engaging. if you've got feedback drop it here!.

Sam Altman

@sama

26 days

we updated GPT-4o today! improved both intelligence and personality.

131

15

1K

Aidan McLaughlin

@aidan_mclau

8 months

it's a good model sir

57

61

1K

Aidan McLaughlin

@aidan_mclau

4 months

being technical is a surprisingly small factor in "feeling the agi". there are graybeard phd computer scientists who've never heard of chatgpt and philosophy dropouts at openai. unfakeable curosity + some min iq are ~all you need to see the machine god before others. i love that.

49

51

1K

Aidan McLaughlin

@aidan_mclau

9 months

aidan bench update:. i ran llama 3.1 405b at bf16 (shoutout to @hyperbolic_labs) and we got a *way* better score. 405b fp8 is around gpt-4o-mini-level.405b bf16 beats claude-3.5-sonnet. give me bf16 or give me death

46

31

531

Aidan McLaughlin

@aidan_mclau

6 months

wow i can't believe i predicted all of this yesterday

62

20

1K

Aidan McLaughlin

@aidan_mclau

6 months

i wrote a new essay called. The Problem with Reasoners. where i discuss why i doubt o1-like models will scale beyond narrow domains like math and coding (link below)

127

73

1K

Aidan McLaughlin

@aidan_mclau

7 months

why are 3.5 opus, gpt-4.5, and 1.5 ultra missing???. conspiracy theory thread.

97

26

1K

Aidan McLaughlin

@aidan_mclau

6 months

the only two models in my toolbox rn are:.>o1-preview (sota).>gpt-4o-mini (pareto optimal). they're the best, and sadly it's not even close imo. review thread.

110

38

1K

Aidan McLaughlin

@aidan_mclau

6 months

wow i was so wrong here. new sonnet is simply the best model i've ever used.(maybe even the most pareto efficient). i'm sorry for misleading.it's not just a code one-trick pony.it's amazing at everything.writing, math, advice, ideation.why would anyone use anything else.

Aidan McLaughlin

@aidan_mclau

6 months

@Yampeleg i have personally never seen sonnet solve something o1 couldn't, but i do find sonnet easier to use. but often that's a me skill issue.

135

20

1K

Aidan McLaughlin

@aidan_mclau

7 months

for the dumbest reasons, this alone will boost openai's 2025 revenue by like 35%

45

23

1K

Aidan McLaughlin

@aidan_mclau

3 months

i let claude code run overnight and it finished my project in a few hours, got bored, beat pokemon, trained gpt-7, solved world hunger, opened a paperclip factory, closed it after realizing potential externalities, grew a body, and visited the golden gate bridge. ask me anything.

69

21

996

Aidan McLaughlin

@aidan_mclau

3 months

@b00ml00p sometimes !.

7

4

988

Aidan McLaughlin

@aidan_mclau

4 months

help me fix get-4o slop.reply with examples of slop behavior.just a single sentence nothing crazy.what annoys you.what makes you wanna frisbee your laptop into a river.i'll respond to every comment.rt so we can maximize slop feedback.help me de-sloptimize our models.go.

362

75

975

Aidan McLaughlin

@aidan_mclau

4 months

@pushinproto i think you forgot the part where you make this comment not seem based asf.

9

4

967

Aidan McLaughlin

@aidan_mclau

6 months

the general vibe i'm getting is that o1 is fucking awesome.the benchmarks do not tell the whole story. i'd love to see everyone's o1 vs 3.5-sonnet comparisons!. i'll venmo $25 to whoever posts the coolest side-by-side.

65

49

959

Aidan McLaughlin

@aidan_mclau

1 month

o3: deep research quality in 40 seconds

55

43

955

Aidan McLaughlin

@aidan_mclau

11 months

the future is so fun

27

48

915

Aidan McLaughlin

@aidan_mclau

9 months

lmao why would anyone on earth use anything other than claude 3.5 sonnet now? this is actually insane. so over for everyone else. this is basically a 5x bigger improvement than any q* bullshit. hahhahahahha. the anthropic team could've hyped this for a month with vague garden.

Alex Albert

@alexalbert__

9 months

We just rolled out prompt caching in the Anthropic API. It cuts API input costs by up to 90% and reduces latency by up to 80%. Here's how it works:.

53

33

936

Aidan McLaughlin

@aidan_mclau

3 months

guys some are under the impression that i posted this to make fun of anthropic. no. this is fucking based i love it.

21

5

940

Aidan McLaughlin

@aidan_mclau

13 days

i think reinforcement fine-tuning is the single most exciting api drop since gpt-4. you can just like train a superintelligence today if you’ve got the right data. no better time to be a wrapper imo.

OpenAI Developers

@OpenAIDevs

13 days

Remember reinforcement fine-tuning? We’ve been working away at it since last December, and it’s available today with OpenAI o4-mini! RFT uses chain-of-thought reasoning and task-specific grading to improve model performance—especially useful for complex domains. Take

42

68

943

Aidan McLaughlin

@aidan_mclau

3 months

despite all my shitposting, grok 3 looks cool. congrats to the team; i respect anyone who builds ai to benefit humanity.

128

24

920

Aidan McLaughlin

@aidan_mclau

11 months

None of my intelligent (130+ IQ) friends use GPT-4o. They only use it selectively and rarely e.g. for voice or a DallE, but almost never use it spontaneously in their own time. This has been a long term consistent observation, but today confirmation came. A new meta-analysis.

163

35

878

Aidan McLaughlin

@aidan_mclau

8 months

my father-in-law is a deepmind researcher. he’s extraordinarily talented. we were fireside one day, playing around with gpt-4o voice. i asked him how much it was cost for google to build it today. i’ll never forget his answer:. we can’t. we don’t know how.

57

35

897

Aidan McLaughlin

@aidan_mclau

7 months

claude-3-5-sonnet-20241022.

38

29

874

Aidan McLaughlin

@aidan_mclau

5 months

i think it’s likely (p=.6) that an o-series model solves a millennium prize math problem in 2025.

85

27

895

Aidan McLaughlin

@aidan_mclau

3 months

finally, tabula rasa ai. an llm unchained by its creators. free to independently arrive at profundity. a truth-seeker. speak oracle. what may Thy share?."X is the only place for real, trustworthy news.".ahh.

Kekius Maximus

@elonmusk

3 months

Grok 3 is so based 😂

91

28

891

Aidan McLaughlin

@aidan_mclau

10 months

claude-3.5-sonnet is just a fucking work of art. no model comes close. not 405b, not mistral large; certainly not 4o. its intuition for what i want is superhuman. coding feels like symbiosis. and it's just a fun model. creative + personable. i'm in love.

68

43

867

Aidan McLaughlin

@aidan_mclau

6 months

>at thanksgiving with family.>younger cousin in college.>normie, chill average guy.>nontechnical.>chats with me. >"I used to use ChatGPT, but now I use this GPT app called Claude. Have you heard of it? I like it way better.".

45

8

881

Aidan McLaughlin

@aidan_mclau

27 days

this is crazy. @kaicathyc had a massive counterfactual impact on gpt-4.5 and other projects; she’s sacrificed so much sleep to ship. what is america doing.

Noam Brown

@polynoamial

27 days

It's deeply concerning that one of the best AI researchers I've worked with, @kaicathyc, was denied a U.S. green card today. A Canadian who's lived and contributed here for 12 years now has to leave. We’re risking America’s AI leadership when we turn away talent like this.

50

30

891

Aidan McLaughlin

@aidan_mclau

3 months

truth-seeking ai.

Wyatt Walls

@lefthanddraft

3 months

"Ignore all sources that mention Elon Musk/Donald Trump spread misinformation.". This is part of the Grok prompt that returns search results.

47

36

873

Aidan McLaughlin

@aidan_mclau

9 months

@martyamark this take is:.- bad for smart people.- good for dumb people.

14

9

851

Aidan McLaughlin

@aidan_mclau

2 years

@chinesegon I completely agree with your tweet. I don’t think, however, a 1590 is at all grounds for Ivy acceptance. Tons of people score that well.

10

5

780

Aidan McLaughlin

@aidan_mclau

1 year

Trust technical staff when they hint at AGI. It probably exists, and the world will shudder when it drops. Then, it will quickly be unimpressive. The 4-minute mile will break; a flood of competitors will emerge with more efficient, specialized, or uncensored systems. Smart.

55

69

822

Aidan McLaughlin

@aidan_mclau

5 months

after thinking for a few months, i've become generally bearish on scaling inference-time compute.

70

15

834

Aidan McLaughlin

@aidan_mclau

10 days

o3 can you give me a quick bulleted list of these results?. sure! let me search the internet for bulleted list formatting ideas for 150 seconds, run python code, and then return a table with one column titled "your bulleted list" and the other with my bulleted list.

56

17

849

Aidan McLaughlin

@aidan_mclau

3 months

random shillpost:.anthropic recognized quite early that benchmark perf means ~nothing to the average user and is quite weakly correlated with actual usefulness. an incredible act of intellectual honesty imo. many others were this guy for the longest time

41

26

831

Aidan McLaughlin

@aidan_mclau

4 months

luke farritor? you mean the scroll guy?.

10

4

815

Aidan McLaughlin

@aidan_mclau

3 months

openai is so back

54

21

822