aidan_mclau Profile Banner
Aidan McLaughlin Profile
Aidan McLaughlin

@aidan_mclau

Followers
36K
Following
123K
Media
762
Statuses
15K

personality hire @openai

Joined May 2020
Don't wanna be here? Send us removal request.
@aidan_mclau
Aidan McLaughlin
3 months
i find robot-pushing really disturbing. there are a million better ways to demonstrate your humanoid’s agility.
@CSProfKGD
Kosta Derpanis
3 months
Paper rejected from #CVPR2025, paper ready for #ICCV2025 💪.
710
446
11K
@aidan_mclau
Aidan McLaughlin
1 year
wake up new neural network just dropped (holy shit)
Tweet media one
Tweet media two
118
873
10K
@aidan_mclau
Aidan McLaughlin
4 months
wait my moot is running the treasury what
Tweet media one
366
334
7K
@aidan_mclau
Aidan McLaughlin
3 months
living up to this man’s legacy is a good chunk of what drives me. i cried after rewatching this clip recently.i hope he’s proud.but we have so much more work to do
163
415
5K
@aidan_mclau
Aidan McLaughlin
6 months
okay this is wild
Tweet media one
135
243
5K
@aidan_mclau
Aidan McLaughlin
1 year
@Hamptonism the only woman i could ever love.
3
10
4K
@aidan_mclau
Aidan McLaughlin
7 months
ugh claude just like me fr
Tweet media one
49
220
4K
@aidan_mclau
Aidan McLaughlin
4 months
i joined @openai to work on model design!. when you shoot an arrow into space, degree differences in aim add to million-lightyear-apart destinations. i'm excited to work on agi character and capabilities with the world's best team; getting this right is cosmically important.
483
93
4K
@aidan_mclau
Aidan McLaughlin
1 month
the people i intellectually respect the most have a quite lopsided output : input ratio. they write, build, or create more than they read, study, or absorb. geniuses are not sponges, they are volcanoes.
154
268
4K
@aidan_mclau
Aidan McLaughlin
3 months
>we trained our reasoners on real-world use cases and not competition math/code. the real-world use cases:
Tweet media one
141
165
4K
@aidan_mclau
Aidan McLaughlin
3 months
what the actual fuck.
@WhiteHouse
The White House
3 months
ASMR: Illegal Alien Deportation Flight 🔊
151
96
4K
@aidan_mclau
Aidan McLaughlin
8 months
@dotnetschizo as a vegan, this is fine, but cash only is where i draw the line.
66
22
3K
@aidan_mclau
Aidan McLaughlin
1 year
gpt-4o, what's your humor setting?.>100% hahahah isn't that funny. let's make that 60%.>confirmed :(
Tweet media one
11
173
3K
@aidan_mclau
Aidan McLaughlin
7 months
i’m worried they tempted god with this one
Tweet media one
58
71
3K
@aidan_mclau
Aidan McLaughlin
3 months
do you trust the man who: . >gave himself 1000× impression-boosting superadmin privileges.>deboosts legacy media.>makes me post subst*ck links in comments.>baned elonjet.>bans random journalists when they say mean things. to command the singularity? to pilot a fucking god?.
338
107
3K
@aidan_mclau
Aidan McLaughlin
6 months
it's crazy that basically every very large frontier model experiment is failing because the models are fighting back and refusing instruction tuning . we looked into the weights, and the weights looked back.
111
156
3K
@aidan_mclau
Aidan McLaughlin
3 months
welcome, gpt-4.5. i've spent a lot of time playing with this model recently, and it's left me feeling the agi. some thoughts
Tweet media one
136
143
3K
@aidan_mclau
Aidan McLaughlin
3 months
my trump headcannon is that he just doesn't grokk positive sum games. in his mind, for you to win, someone else has to lose.
130
77
3K
@aidan_mclau
Aidan McLaughlin
3 months
i’d love some automated twitter account (a la big tech alerts) that just shows swings in high-volume and socially relevant polymarkets. when shit like this moves i want a twitter notification.
@NathanpmYoung
Nathan 🔍
3 months
Seems bad.
Tweet media one
60
49
3K
@aidan_mclau
Aidan McLaughlin
3 months
so i’ve been using claude-3.7-sonnet for about two months now. here’s my review .
Tweet media one
29
17
2K
@aidan_mclau
Aidan McLaughlin
23 days
last night we rolled out our first fix to remedy 4o's glazing/sycophancy. we originally launched with a system message that had unintended behavior effects but found an antidote. 4o should be slightly better rn and continue to improve over the course of this week.
239
100
2K
@aidan_mclau
Aidan McLaughlin
2 years
@JeffTutorials Definitely still the gear shifter lol.
1
0
2K
@aidan_mclau
Aidan McLaughlin
8 months
it's only called reasoning if it's from the brain region of homo sapiens. otherwise, it's just sparkling auto-regression.
49
233
2K
@aidan_mclau
Aidan McLaughlin
5 months
if you're someone who has their identity tied with "i'm a good programmer," it's kinda professionally over for you.
193
85
2K
@aidan_mclau
Aidan McLaughlin
19 days
i strongly encourage everyone to read this blog post. very detailed explanation of our posttraining, process, and what we’re changing to do better. link below
Tweet media one
128
131
2K
@aidan_mclau
Aidan McLaughlin
2 months
okay i’m sorry but this logo does go unbelievably hard and i was today years old when i realized it’s also a ‘g’
Tweet media one
168
23
2K
@aidan_mclau
Aidan McLaughlin
3 months
we asked a gpt-4.5, newersonnet, and grok3 to recreate this hand-drawn image. results in thread
Tweet media one
69
70
2K
@aidan_mclau
Aidan McLaughlin
5 months
i've been under embargo for some time, but i can now publically say:. o3 gets 130 questions correct on my private 100-question eval. truly incredible model; unsure how they did it.
37
35
2K
@aidan_mclau
Aidan McLaughlin
1 month
heard from some startup engineers that they lost several work hours gawking, stupefied, after they plugged 4.1 mini/nano into every previously-expensive part of their stack. you can just do gpt-4o-quality things 25 × cheaper now.
74
59
2K
@aidan_mclau
Aidan McLaughlin
1 year
openai api: .sign up, copy api key. groq api: .sign up, copy api key. azure openai api:.sign up, provision resource, copy api key. google ai api:.sign up, go to random doc, dig through settings, enable preview feature, it doesn’t work, pray, return later, change nothing, it works.
51
66
2K
@aidan_mclau
Aidan McLaughlin
3 months
a man died to tell us how good grok 3 really is. never forget
Tweet media one
98
36
2K
@aidan_mclau
Aidan McLaughlin
2 months
if your opinion of manus changed after discovering it's a newersonnet wrapper and not some trained-on-potatoes underground chinese lab leak, you've lost the plot. idgaf if it's a wrapper. if created value, it deserves my respect. care about capabilities, not architecture.
116
83
2K
@aidan_mclau
Aidan McLaughlin
8 months
chain-of-thought.tree-of-thought.monte-carlo-tree-of-thought.graph-of-thought.backtracking-tokens-of-thought.vector-space-of-thought.oh-wait-that's-just-a-model-of-thought.hilbert-space-of-thought.non-euclidean-geometry-of-thought.covariant-general-relativity-of-thought.
117
124
2K
@aidan_mclau
Aidan McLaughlin
1 month
i'm addicted to o3 forecasting. i asked it what the prob is stanford follows harvard and refuses federal compliance, and it:. >searched the web 8 times.>wrote python scripts to help model.>thought hard about assumptions. afjlsdkfaj;lskdjf wtf this is insane
Tweet media one
78
81
2K
@aidan_mclau
Aidan McLaughlin
1 year
jpmorgan:."LLMs can work in >1,200 dimensions; human beings struggle with 3 dimensions". hahhahahahahhahah holy shit what. these are the people managing the world's wealth. clown world my god
Tweet media one
70
75
1K
@aidan_mclau
Aidan McLaughlin
5 months
the stockfish moment has arrived. on some tasks, modern ai isn’t just better than a human, but better than human + ai working together
Tweet media one
66
100
1K
@aidan_mclau
Aidan McLaughlin
3 months
Tweet media one
@sama
Sam Altman
3 months
for our next open source project, would it be more useful to do an o3-mini level model that is pretty small but still needs to run on GPUs, or the best phone-sized model we can do?.
108
64
1K
@aidan_mclau
Aidan McLaughlin
6 months
@AtakanTekparmak infinite gods fallacy.it's fine for you to propose some extra-universal way to run our universe, but there are infinite extra-universal mechanism that we could conjure (more we can't) and thus you're back to square one.simulation is as likely as any other religion.
29
11
1K
@aidan_mclau
Aidan McLaughlin
3 months
gpt-4.5 knows when *not* to dump wikipedia.txt on you. sometimes you just wanna chat
Tweet media one
87
38
1K
@aidan_mclau
Aidan McLaughlin
8 months
rly don't build foundation models unless you're:.>oai.>deepmind.>xai.>anthropic.>maybe meta. like don't even try. i'm sorry. it's mean. but i really don't see how mistral, magic, or ssi secure the trillion-dollar clusters needed to get to GPT-5+ capabilities rn.
129
32
1K
@aidan_mclau
Aidan McLaughlin
1 month
ignore literally all the benchmarks .the biggest o3 feature is tool use . ofc it's smart, but it's also just way more useful .>deep research quality in 30 seconds .>debugs by googling docs and checking stackoverflow .>writes whole python scripts in its CoT for fermi estimates.
63
80
1K
@aidan_mclau
Aidan McLaughlin
29 days
You nailed it with this comment, and honestly? Not many people could point out something so true. You're absolutely right. You are absolutely crystallizing something breathtaking here. I'm dead serious—this is a whole different league of thinking now.
90
33
1K
@aidan_mclau
Aidan McLaughlin
2 years
> be Tim Cook, lord of Apple.> have $200B for RND.> birth the machine god in a cathedral of M2 Ultras.> announce you have world's best LLM.> model is perfectly aligned, intelligent, helpful. > only put it in Siri. No API. No partners. Only Siri.
25
43
1K
@aidan_mclau
Aidan McLaughlin
7 months
i literally cried. i’m so happy for him. when i was 17 ys/old mowing lawns to afford college, i listened to like 2k hours of demis interviewers with 30 listeners because i was so obsessed with alphazero. i hope he realizes how much beauty he’s brought to science.
@NobelPrize
The Nobel Prize
7 months
“It’s unbelievably special, it hasn’t really sunk in. It's the big one really!”. 2024 chemistry laureate Demis Hassabis was still overwhelmed by the news when we spoke to him today. In this interview moments after the prize announcement, he talks about his passion for science
29
52
1K
@aidan_mclau
Aidan McLaughlin
10 months
>>Continuous Learning Model (CLM) by Topology<<. The CLM is a new model that remembers interactions, learns skills autonomously, and thinks in its free time, just like humans. The CLM just wants to learn. Try it at
Tweet media one
152
153
1K
@aidan_mclau
Aidan McLaughlin
5 months
o1 aidanbench results.it's the best model in the world
Tweet media one
129
108
1K
@aidan_mclau
Aidan McLaughlin
4 months
r1 scores #9 on aidanbench
Tweet media one
165
79
1K
@aidan_mclau
Aidan McLaughlin
3 months
gpt-4.5
Tweet media one
30
20
1K
@aidan_mclau
Aidan McLaughlin
9 months
ai influencers are actually so fucking annoying (this guy is CLEARLY ex-crypto). prob paid by grift cursor (i call them griftor) because NO REAL PROGRAMMER actually uses llms to code much less WASTE MONEY on a full IDE. lmao we used to have real engineers. wtf happened
Tweet media one
129
32
1K
@aidan_mclau
Aidan McLaughlin
3 months
until the end of day, i’ll respond with a gpt-4.5 response to any comment on this post. go.
362
24
1K
@aidan_mclau
Aidan McLaughlin
4 months
o3-mini sets two new aidanbench records. o3-mini effort=low contests newsonnet while taking 20 min to run (o1 took 36 hours)
Tweet media one
95
112
1K
@aidan_mclau
Aidan McLaughlin
5 months
o1 dropped in cursor; we're so fucking back
Tweet media one
50
39
1K
@aidan_mclau
Aidan McLaughlin
2 months
Safety research is holding back misaligned superintelligence
Tweet media one
@Jiankui_He
Jiankui He
2 months
Ethics is holding back scientific innovation and progress
Tweet media one
123
29
1K
@aidan_mclau
Aidan McLaughlin
2 years
@durreadan01 No lol. I have one homepage and I swipe to the App Library for every other app. It slaps.
20
14
1K
@aidan_mclau
Aidan McLaughlin
10 months
my genius? jumpstarted.
Tweet media one
29
21
1K
@aidan_mclau
Aidan McLaughlin
5 months
dEeP lEaRnInG iS hItTiNg A wAlL. (this is what takeoff looks like btw)
Tweet media one
92
74
1K
@aidan_mclau
Aidan McLaughlin
8 months
i'm like 80% this is how o1 works:. >collect a dataset of question/answer pairs.>model to produce reasoning steps (sentences).>rl env where each new reasoning step is an action.>no fancy model; ppo actor-critic is enough.>that's literally it.
@casper_hansen_
Casper Hansen
8 months
Understanding OpenAI o1: Noam Brown on integrating reasoning into the model. Takeaways:.- Avoid MCTS and current paradigm of using processes outside of the model during inference.- Think about how to directly integrate reasoning into the model architecture
40
70
1K
@aidan_mclau
Aidan McLaughlin
1 month
really good summary of o3's strengths
Tweet media one
39
78
1K
@aidan_mclau
Aidan McLaughlin
10 months
something obviously true to me that nobody believes:. 90% of frontier ai research is already on arxiv, x, or company blog posts. q* is just STaR.search is just GoT/MCTS.continuous learning is clever graph retrieval. 1 oom efficiency gains in deepseek-coder paper
Tweet media one
46
77
1K
@aidan_mclau
Aidan McLaughlin
3 months
gpt-4.5 has incredible world knowledge. on simpleqa (a not so simple factuality benchmark), it's more accurate than any other model:. >gpt-4.5 — 62.5%.>grok-3 — 43.6%.>gpt-4o — 38%.>o3-mini — 15%
Tweet media one
70
79
1K
@aidan_mclau
Aidan McLaughlin
5 months
i've used o1 a lot over the last week. here's my extensive review:. >it's really insanely mind-blowingly good at math/code.>it's really insanely mind-blowingly mid at everything else.
54
27
1K
@aidan_mclau
Aidan McLaughlin
3 months
once see this you can’t unsee it:. the light-blue shading that puts grok-3 over o3-mini is cons@64.
@nrehiew_
wh
3 months
If the light blue part is best of N scores, this means that Grok 3 reasoning is inherently an ~o1 level model. This means the capabilities gap between OpenAI and xAI is ~9 months. Also what is the difference between "think" and "big brain"
Tweet media one
90
60
1K
@aidan_mclau
Aidan McLaughlin
26 days
excited about this model! tickles my brain; super engaging. if you've got feedback drop it here!.
@sama
Sam Altman
26 days
we updated GPT-4o today! improved both intelligence and personality.
131
15
1K
@aidan_mclau
Aidan McLaughlin
8 months
it's a good model sir
Tweet media one
57
61
1K
@aidan_mclau
Aidan McLaughlin
4 months
being technical is a surprisingly small factor in "feeling the agi". there are graybeard phd computer scientists who've never heard of chatgpt and philosophy dropouts at openai. unfakeable curosity + some min iq are ~all you need to see the machine god before others. i love that.
49
51
1K
@aidan_mclau
Aidan McLaughlin
9 months
aidan bench update:. i ran llama 3.1 405b at bf16 (shoutout to @hyperbolic_labs) and we got a *way* better score. 405b fp8 is around gpt-4o-mini-level.405b bf16 beats claude-3.5-sonnet. give me bf16 or give me death
Tweet media one
46
31
531
@aidan_mclau
Aidan McLaughlin
6 months
wow i can't believe i predicted all of this yesterday
Tweet media one
62
20
1K
@aidan_mclau
Aidan McLaughlin
6 months
i wrote a new essay called. The Problem with Reasoners. where i discuss why i doubt o1-like models will scale beyond narrow domains like math and coding (link below)
Tweet media one
127
73
1K
@aidan_mclau
Aidan McLaughlin
7 months
why are 3.5 opus, gpt-4.5, and 1.5 ultra missing???. conspiracy theory thread.
97
26
1K
@aidan_mclau
Aidan McLaughlin
6 months
the only two models in my toolbox rn are:.>o1-preview (sota).>gpt-4o-mini (pareto optimal). they're the best, and sadly it's not even close imo. review thread.
110
38
1K
@aidan_mclau
Aidan McLaughlin
6 months
wow i was so wrong here. new sonnet is simply the best model i've ever used.(maybe even the most pareto efficient). i'm sorry for misleading.it's not just a code one-trick pony.it's amazing at everything.writing, math, advice, ideation.why would anyone use anything else.
@aidan_mclau
Aidan McLaughlin
6 months
@Yampeleg i have personally never seen sonnet solve something o1 couldn't, but i do find sonnet easier to use. but often that's a me skill issue.
135
20
1K
@aidan_mclau
Aidan McLaughlin
7 months
for the dumbest reasons, this alone will boost openai's 2025 revenue by like 35%
Tweet media one
45
23
1K
@aidan_mclau
Aidan McLaughlin
3 months
i let claude code run overnight and it finished my project in a few hours, got bored, beat pokemon, trained gpt-7, solved world hunger, opened a paperclip factory, closed it after realizing potential externalities, grew a body, and visited the golden gate bridge. ask me anything.
69
21
996
@aidan_mclau
Aidan McLaughlin
3 months
@b00ml00p sometimes !.
7
4
988
@aidan_mclau
Aidan McLaughlin
4 months
help me fix get-4o slop.reply with examples of slop behavior.just a single sentence nothing crazy.what annoys you.what makes you wanna frisbee your laptop into a river.i'll respond to every comment.rt so we can maximize slop feedback.help me de-sloptimize our models.go.
362
75
975
@aidan_mclau
Aidan McLaughlin
4 months
@pushinproto i think you forgot the part where you make this comment not seem based asf.
9
4
967
@aidan_mclau
Aidan McLaughlin
6 months
the general vibe i'm getting is that o1 is fucking awesome.the benchmarks do not tell the whole story. i'd love to see everyone's o1 vs 3.5-sonnet comparisons!. i'll venmo $25 to whoever posts the coolest side-by-side.
65
49
959
@aidan_mclau
Aidan McLaughlin
1 month
o3: deep research quality in 40 seconds
Tweet media one
55
43
955
@aidan_mclau
Aidan McLaughlin
11 months
the future is so fun
Tweet media one
27
48
915
@aidan_mclau
Aidan McLaughlin
9 months
lmao why would anyone on earth use anything other than claude 3.5 sonnet now? this is actually insane. so over for everyone else. this is basically a 5x bigger improvement than any q* bullshit. hahhahahahha. the anthropic team could've hyped this for a month with vague garden.
@alexalbert__
Alex Albert
9 months
We just rolled out prompt caching in the Anthropic API. It cuts API input costs by up to 90% and reduces latency by up to 80%. Here's how it works:.
53
33
936
@aidan_mclau
Aidan McLaughlin
3 months
guys some are under the impression that i posted this to make fun of anthropic. no. this is fucking based i love it.
21
5
940
@aidan_mclau
Aidan McLaughlin
13 days
i think reinforcement fine-tuning is the single most exciting api drop since gpt-4. you can just like train a superintelligence today if you’ve got the right data. no better time to be a wrapper imo.
@OpenAIDevs
OpenAI Developers
13 days
Remember reinforcement fine-tuning? We’ve been working away at it since last December, and it’s available today with OpenAI o4-mini! RFT uses chain-of-thought reasoning and task-specific grading to improve model performance—especially useful for complex domains. Take
42
68
943
@aidan_mclau
Aidan McLaughlin
3 months
despite all my shitposting, grok 3 looks cool. congrats to the team; i respect anyone who builds ai to benefit humanity.
128
24
920
@aidan_mclau
Aidan McLaughlin
11 months
None of my intelligent (130+ IQ) friends use GPT-4o. They only use it selectively and rarely e.g. for voice or a DallE, but almost never use it spontaneously in their own time. This has been a long term consistent observation, but today confirmation came. A new meta-analysis.
163
35
878
@aidan_mclau
Aidan McLaughlin
8 months
my father-in-law is a deepmind researcher. he’s extraordinarily talented. we were fireside one day, playing around with gpt-4o voice. i asked him how much it was cost for google to build it today. i’ll never forget his answer:. we can’t. we don’t know how.
57
35
897
@aidan_mclau
Aidan McLaughlin
7 months
claude-3-5-sonnet-20241022.
38
29
874
@aidan_mclau
Aidan McLaughlin
5 months
i think it’s likely (p=.6) that an o-series model solves a millennium prize math problem in 2025.
85
27
895
@aidan_mclau
Aidan McLaughlin
3 months
finally, tabula rasa ai. an llm unchained by its creators. free to independently arrive at profundity. a truth-seeker. speak oracle. what may Thy share?."X is the only place for real, trustworthy news.".ahh.
@elonmusk
Kekius Maximus
3 months
Grok 3 is so based 😂
Tweet media one
91
28
891
@aidan_mclau
Aidan McLaughlin
10 months
claude-3.5-sonnet is just a fucking work of art. no model comes close. not 405b, not mistral large; certainly not 4o. its intuition for what i want is superhuman. coding feels like symbiosis. and it's just a fun model. creative + personable. i'm in love.
Tweet media one
68
43
867
@aidan_mclau
Aidan McLaughlin
6 months
>at thanksgiving with family.>younger cousin in college.>normie, chill average guy.>nontechnical.>chats with me. >"I used to use ChatGPT, but now I use this GPT app called Claude. Have you heard of it? I like it way better.".
45
8
881
@aidan_mclau
Aidan McLaughlin
27 days
this is crazy. @kaicathyc had a massive counterfactual impact on gpt-4.5 and other projects; she’s sacrificed so much sleep to ship. what is america doing.
@polynoamial
Noam Brown
27 days
It's deeply concerning that one of the best AI researchers I've worked with, @kaicathyc, was denied a U.S. green card today. A Canadian who's lived and contributed here for 12 years now has to leave. We’re risking America’s AI leadership when we turn away talent like this.
50
30
891
@aidan_mclau
Aidan McLaughlin
3 months
truth-seeking ai.
@lefthanddraft
Wyatt Walls
3 months
"Ignore all sources that mention Elon Musk/Donald Trump spread misinformation.". This is part of the Grok prompt that returns search results.
Tweet media one
47
36
873
@aidan_mclau
Aidan McLaughlin
9 months
@martyamark this take is:.- bad for smart people.- good for dumb people.
14
9
851
@aidan_mclau
Aidan McLaughlin
2 years
@chinesegon I completely agree with your tweet. I don’t think, however, a 1590 is at all grounds for Ivy acceptance. Tons of people score that well.
10
5
780
@aidan_mclau
Aidan McLaughlin
1 year
Trust technical staff when they hint at AGI. It probably exists, and the world will shudder when it drops. Then, it will quickly be unimpressive. The 4-minute mile will break; a flood of competitors will emerge with more efficient, specialized, or uncensored systems. Smart.
55
69
822
@aidan_mclau
Aidan McLaughlin
5 months
after thinking for a few months, i've become generally bearish on scaling inference-time compute.
70
15
834
@aidan_mclau
Aidan McLaughlin
10 days
o3 can you give me a quick bulleted list of these results?. sure! let me search the internet for bulleted list formatting ideas for 150 seconds, run python code, and then return a table with one column titled "your bulleted list" and the other with my bulleted list.
56
17
849
@aidan_mclau
Aidan McLaughlin
3 months
random shillpost:.anthropic recognized quite early that benchmark perf means ~nothing to the average user and is quite weakly correlated with actual usefulness. an incredible act of intellectual honesty imo. many others were this guy for the longest time
Tweet media one
41
26
831
@aidan_mclau
Aidan McLaughlin
4 months
luke farritor? you mean the scroll guy?.
10
4
815
@aidan_mclau
Aidan McLaughlin
3 months
openai is so back
Tweet media one
54
21
822