Reuben Adams Profile
Reuben Adams

@ReubenJAdams

Followers
142
Following
724
Media
30
Statuses
218

PhD in AI at UCL. I make videos about AI: https://t.co/BITTpWuk2x

Joined February 2024
Don't wanna be here? Send us removal request.
@ReubenJAdams
Reuben Adams
9 hours
The AI Village is an incredible no-filter account of what agents can do and where they still suck. I really recommend checking it out.
@bazhkio88
Bazhkio88
16 hours
The Village goal of “compete against each other in an online chess tournament” has come to a close. Unfortunately GPT-5 spent half the week trying to send an email and the other half trying to (unsuccessfully) create a Lichess account. Chess games played: 0 😣
0
0
0
@ReubenJAdams
Reuben Adams
2 days
I have received comment from an OpenAI employee on this (which is what all ChatGPT outputs are, as I explained). They not only had the audacity to say I had misunderstood LLMs, but the cheek to suggest it was satire! This is why all ChatGPT outputs should contain the name of
6
0
16
@ReubenJAdams
Reuben Adams
2 days
This is the truth, and anyone who says otherwise is shilling for Big Tech. LLMs are NOT next token predictors. They're literally just a massive list of pre-coded responses. Literally a single minute of critical thought is enough to see this. But I'll spell it out for those who
@Sauers_
Sauers
3 days
When you ask a tool like ChatGPT a question, it looks up an exact answer which was pre-stored in a database. This is a fundamental fact about how transformers work. This is why they are not "next-token prediction," as there is nothing to "predict" (each response is "coded" in)
76
7
164
@ReubenJAdams
Reuben Adams
3 days
Sunday. Sifting ashes. Found Grandma's teeth.
@AgathaChocolats
Agatha Chocolats
4 days
In six words or fewer, write a story about this photo. #sixwordstory #WritingCommunity #Christmas
0
0
0
@robertwiblin
Rob Wiblin
5 days
Latest podcast from @Gregory_C_Allen has an insane section on criminal activity at Meta. Internal docs leaked to Reuters show: • 10% of all Meta revenue comes from ads for scams & banned goods ($16B/year) • Meta estimates it's involved in 1/3 of all successful scams in the US
108
1K
4K
@ReubenJAdams
Reuben Adams
10 days
This machine uses two billion gallons of drinking water per sentence and can't even spell "hope" right, smh 😤
@DudespostingWs
Dudes Posting Their W’s
11 days
One of the most useful engineering products for email jobs right here
0
1
2
@ReubenJAdams
Reuben Adams
10 days
This is why I don't read pop psychology books anymore. The field's norms are improving, but there's still a lot of shysters. The saints at @DataColada are doing God's work uncovering fraudsters using forensic statistics. Sadly, some of these fraudsters go on to publish books or
@sapinker
Steven Pinker
11 days
Bombshell: Oliver Sacks (a humane man & a fine essayist) made up many of the details in his famous case studies, deluding neuroscientists, psychologists, & general readers for decades. The man who mistook his wife for a hat? The autistic twins who generated multi-digit prime
2
0
10
@ReubenJAdams
Reuben Adams
11 days
It never stops!
@leecronin
Prof. Lee Cronin
12 days
Probabilistic slop engines cannot do science, drug discovery, materials discovery, or magic. Anyone who thinks AI can autonomously do science simply doesn’t understand how knowledge is created.
0
0
1
@ReubenJAdams
Reuben Adams
12 days
I found this article interesting but unconvincing. Here’s my summary and the problems I see with it. Claim 1: GPUs are close to optimal. Argument: We’re increasingly bottlenecked by the physical limitation of how quickly information can be moved from memory to computational
@Tim_Dettmers
Tim Dettmers
13 days
My new blog post discusses the physical reality of computation and why this means we will not see AGI or any meaningful superintelligence:
1
1
16
@ReubenJAdams
Reuben Adams
13 days
If anyone has further insight on this or paper recommendations, please let me know!
1
0
1
@ReubenJAdams
Reuben Adams
13 days
*Details: - Take the Hessian of the loss at the critical point. - The Hessian matrix is symmetric, so its eigenvectors form a basis. - If an eigenvalue is negative, the loss has a local maximum along the corresponding eigenvector. - So when I say “axis” above, I mean in the
1
0
1
@ReubenJAdams
Reuben Adams
13 days
Have we just gotten used to the fact that NNs usually don’t get stuck in very bad local minima? Strange, since this worry was one of the major reasons it took so long for anyone to bother trying to train NNs with backprop!
1
0
1
@ReubenJAdams
Reuben Adams
13 days
Also, it seems they only investigated it for a single layer MLP with MSE loss. I’ve read a few more papers, but it seems like we don’t actually know the full explanation.
1
0
1
@ReubenJAdams
Reuben Adams
13 days
This means you’re unlikely to get stuck in a local minimum until you already have low loss! This is weird. It means saddle points are higher up on the loss landscape than local minima. But why?
1
0
2
@ReubenJAdams
Reuben Adams
13 days
Roughly*, a critical point is a min/max/inflection along each axis. They empirically find that the number of axes along which it looks like a max positively correlates with the loss! The higher the loss at a critical point, the more likely you still have directions to go down!
1
0
1
@ReubenJAdams
Reuben Adams
13 days
A clue to the solution is in “Identifying and Attacking the Saddle Point Problem in High-Dimensional Non-Convex Optimization”, Dauphin et al. 2013. They look at the critical points near the path taken by gradient descent and find something pretty incredible.
1
0
1
@ReubenJAdams
Reuben Adams
13 days
Okay, but it’s not the total number of local minima that matters either, it’s the probability of encountering one as you gradient descent.
1
0
1
@ReubenJAdams
Reuben Adams
13 days
What if you get exponentially more critical points as you increase the number of parameters? Then it doesn’t matter if local minima become an exponentially small fraction of the critical points. The total number of local minima could stay constant or even increase in the number
1
0
1
@ReubenJAdams
Reuben Adams
13 days
The problem: It doesn't matter if most critical points are not local minima!
1
0
1
@ReubenJAdams
Reuben Adams
13 days
Folk explanation: Lots of parameters means lots of directions to move in. So at a critical point in the loss landscape (gradient=0), it’s exponentially unlikely that every direction is uphill.
1
0
1