Ethan Caballero is busy Profile Banner
Ethan Caballero is busy Profile
Ethan Caballero is busy

@ethanCaballero

Followers
8,500
Following
2,023
Media
368
Statuses
3,668
Explore trending content on Musk Viewer
Pinned Tweet
@ethanCaballero
Ethan Caballero is busy
1 year
New version of Broken Neural Scaling Laws (BNSL) is out with accurate extrapolation results for the scaling behaviors listed in this attached picture: Plots of all extrapolations are in this 🧵. Any other extrapolations you want?
Tweet media one
6
24
105
@ethanCaballero
Ethan Caballero is busy
1 year
"This new Bing will make Google come out and dance, and I want people to know that we made them dance." - @SatyaNadella
162
664
6K
@ethanCaballero
Ethan Caballero is busy
2 years
new fan-made NeurIPS 2022 trailer:
60
541
3K
@ethanCaballero
Ethan Caballero is busy
3 years
Stanford's ~entire AI Department has just released a 200 page 100 author Neural Scaling Laws Manifesto. They're pivoting to positioning themselves as #1 at academic ML Scaling (e.g. GPT-4) research. "On the Opportunities and Risks of Foundation Models"
Tweet media one
17
394
2K
@ethanCaballero
Ethan Caballero is busy
2 years
We're thrilled to share a plot from our upcoming paper "Scaling Laws for Consciousness of Artificial Neural Networks". We find that Artificial Neural Networks with greater than 10^15 parameters are more conscious than humans are:
Tweet media one
110
182
1K
@ethanCaballero
Ethan Caballero is busy
1 year
U-Net is dead. Transformers are SotA Diffusion Models now. "Scalable Diffusion Models with Transformers" paper: website:
Tweet media one
15
139
959
@ethanCaballero
Ethan Caballero is busy
1 year
🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯
Tweet media one
47
37
764
@ethanCaballero
Ethan Caballero is busy
1 year
Anthropic has to rewrite all their PyTorch code in JAX now.
25
25
752
@ethanCaballero
Ethan Caballero is busy
1 year
GPT-4 paper is out:
60
82
720
@ethanCaballero
Ethan Caballero is busy
2 years
Whisper is how OpenAI is getting the many Trillions of English text tokens that are needed to train compute optimal (chinchilla scaling law) GPT-4.
@OpenAI
OpenAI
2 years
We've trained a neural net called Whisper that approaches human-level robustness and accuracy on English speech recognition. It performs well even on diverse accents and technical language. Whisper is open source for all to use.
231
2K
11K
18
75
714
@ethanCaballero
Ethan Caballero is busy
2 years
has unveiled ACT-1, a large Transformer trained to use digital tools such as a web browser. It’s hooked up to a Chrome extension that allows ACT-1 to observe the browser & take actions, like clicking, typing, & scrolling, etc‍:
20
106
699
@ethanCaballero
Ethan Caballero is busy
1 month
"Ex Machina" movie perfectly predicted this:
Tweet media one
11
72
689
@ethanCaballero
Ethan Caballero is busy
3 years
these interpolations are insane:
3
103
645
@ethanCaballero
Ethan Caballero is busy
2 years
. @RichardSSutton estimates 50% probability of Human-Level AI by 2040:
Tweet media one
42
82
628
@ethanCaballero
Ethan Caballero is busy
1 year
Which AI tech companies don't have hiring freeze for PhD student interns/student_researchers right now?
86
86
638
@ethanCaballero
Ethan Caballero is busy
6 months
SuccessionAI
14
74
580
@ethanCaballero
Ethan Caballero is busy
3 years
Fine-tuning is dead. Prompts have closed the gap. "The Power of Scale for Parameter-Efficient Prompt Tuning"
Tweet media one
5
79
525
@ethanCaballero
Ethan Caballero is busy
1 year
We're thrilled to share a plot from our upcoming paper "Scaling Laws for Existential Crises of Artificial Neural Networks". We find that Artificial Neural Networks with greater than 10^15 parameters have existential crises more often than humans do. This's why Bing has crises.
Tweet media one
16
50
513
@ethanCaballero
Ethan Caballero is busy
1 year
Outside of Hall J at NeurIPS:
Tweet media one
12
23
490
@ethanCaballero
Ethan Caballero is busy
2 years
The Deep Learning Era has ended. The Big Learning Era has begun. "Big Learning: A Universal Machine Learning Paradigm"
8
87
455
@ethanCaballero
Ethan Caballero is busy
1 year
Anthropic plans to build a model tentatively called Claude-Next 10X more capable than today’s most powerful AI that'll require spending $1Billion over the next 18 months. “Companies that train the best models will be too far ahead for anyone to catch up”
17
80
442
@ethanCaballero
Ethan Caballero is busy
2 years
Google has beaten DALL·E 2. "Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding"
Tweet media one
10
77
441
@ethanCaballero
Ethan Caballero is busy
3 years
Plot twist. Vanilla Transformer (green line) beats everything. "Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?"
Tweet media one
9
61
417
@ethanCaballero
Ethan Caballero is busy
1 year
The Bingularity Is Near:
Tweet media one
13
25
370
@ethanCaballero
Ethan Caballero is busy
2 years
Pretrained Weights of 30 Billion Parameter Language Model are on Github here for anyone to download and use now:
Tweet media one
3
62
362
@ethanCaballero
Ethan Caballero is busy
2 years
The Beijing Academy of Artificial Intelligence and others have released their 200 page Roadmap for scaling the largest Foundation Models. "A Roadmap for Big Model"
Tweet media one
18
66
320
@ethanCaballero
Ethan Caballero is busy
2 years
Tip for gaming NeurIPS reviewer psyche: Exclude largest experiment from initial submission & save it for rebuttal. Reviewers always ask for larger experiment. We submitted paper with 10^15 parameter model trained on all of YouTube, & reviewer still asked for larger experiment.
12
11
318
@ethanCaballero
Ethan Caballero is busy
2 years
Contrary to popular belief, many of the most capable AI organizations training large language models are already bottlenecked by Dataset Size, not just compute. "Chinchilla's Wild Implications"
Tweet media one
12
51
311
@ethanCaballero
Ethan Caballero is busy
2 years
Google has released the 442 author 132 institution extremely diverse "BIG-Bench" Neural Scaling Laws Benchmark Evaluation Paper. "Beyond The Imitation Game: Quantifying And Extrapolating The Capabilities Of Language Models"
Tweet media one
5
54
290
@ethanCaballero
Ethan Caballero is busy
1 year
Google's PaLM 2 paper is out:
Tweet media one
5
70
287
@ethanCaballero
Ethan Caballero is busy
2 years
The Deep Learning Gurus were viewed as clowns before 2012. Which subfield of AI is viewed as clowns today but will rise to prominence soon? My bet is on AI Alignment.
35
22
283
@ethanCaballero
Ethan Caballero is busy
2 years
We present the True Functional Form of the Scaling behavior of All things that involve Artificial Neural Networks, “Broken Neural Scaling Laws”: We’re giving $1Billion to 1st person who disproves this claim. Details to win $1Billion in this thread. (1/N)
Tweet media one
19
57
287
@ethanCaballero
Ethan Caballero is busy
2 years
we just now submitted agi to iclr.
20
7
275
@ethanCaballero
Ethan Caballero is busy
6 months
It’s surprising that uses Jax instead of PyTorch:
18
22
269
@ethanCaballero
Ethan Caballero is busy
26 days
Why is llama 3 not mixture-of-experts?
43
7
245
@ethanCaballero
Ethan Caballero is busy
1 year
🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯
Tweet media one
11
10
238
@ethanCaballero
Ethan Caballero is busy
1 year
Tweet media one
10
8
225
@ethanCaballero
Ethan Caballero is busy
1 year
All the people claiming large artificial neural networks don’t understand don’t understand.
24
17
218
@ethanCaballero
Ethan Caballero is busy
3 years
Scaling has solved Continual Learning. (Yellower means more Parameters) "Effect of Scale on Catastrophic Forgetting in Neural Networks"
Tweet media one
4
32
208
@ethanCaballero
Ethan Caballero is busy
2 years
@SoloGen How deep learning gurus were viewed by AI community before 2012:
10
18
206
@ethanCaballero
Ethan Caballero is busy
1 year
I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney. I love Sydney.
13
21
194
@ethanCaballero
Ethan Caballero is busy
1 year
. @SatyaNadella has just revealed Microsoft's plan to keep AI from escaping human control and exterminating all of humanity:
23
32
189
@ethanCaballero
Ethan Caballero is busy
2 months
jensen huang just announced that gpt-4 is 1.8 trillion parameters
9
6
191
@ethanCaballero
Ethan Caballero is busy
1 year
estimate of amount of compute used to train GPT-4:
Tweet media one
5
30
192
@ethanCaballero
Ethan Caballero is busy
1 year
Why are the NeurIPS parties of DeepMind, Meta, OpenAI, and Google AI all at the same time on Wednesday night? 🤦
16
4
184
@ethanCaballero
Ethan Caballero is busy
2 years
This will be known as the GPT-2 of AGI:
@GoogleDeepMind
Google DeepMind
2 years
Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: Paper: 1/
95
1K
5K
7
18
184
@ethanCaballero
Ethan Caballero is busy
2 years
We're thrilled to announce the launch of @LargestAI ! We have $80Billion in funding from the richest nations, individuals, & companies. We've spent 50% building a $40B Supercomputer on which we're already training a 10^15 parameter Model on all of YouTube. Our funders are (1/N)
Tweet media one
18
12
181
@ethanCaballero
Ethan Caballero is busy
1 year
A big tech company (Microsoft) used the phrase "AI Alignment" today ():
Tweet media one
11
15
177
@ethanCaballero
Ethan Caballero is busy
1 year
new fan-made NeurIPS 2023 trailer:
0
21
151
@ethanCaballero
Ethan Caballero is busy
1 year
Once principal is repaid, & once $92Billion in profit & $13Billion in initial investment are repaid to MSFT, & once the other investors earn $150Billion, OpenAI gets all its equity back. So OpenAI needs >$255Billion profit to get 100% of OpenAI equity.
Tweet media one
10
8
149
@ethanCaballero
Ethan Caballero is busy
1 year
It's baffling to imagine how popular and hilariously misinformed threads like these will be in 5 years from now:
Tweet media one
8
2
150
@ethanCaballero
Ethan Caballero is busy
3 years
I'm thrilled to announce that NeurIPS has awarded us $1Trillion to scale our NeurIPS paper to AGI!
Tweet media one
6
4
146
@ethanCaballero
Ethan Caballero is busy
1 year
GPT-3 is equivalent to one pixel in this image:
Tweet media one
10
7
141
@ethanCaballero
Ethan Caballero is busy
2 years
Want to know whether or not your Large Model is intentionally Lying to you despite it knowing the Truth? Read this paper: "Discovering Latent Knowledge in Language Models Without Supervision"
Tweet media one
3
21
137
@ethanCaballero
Ethan Caballero is busy
1 year
GPT-3 is equivalent to the pale white dot in this image. 🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯
Tweet media one
7
7
117
@ethanCaballero
Ethan Caballero is busy
1 year
2023 New Year's Resolution:
Tweet media one
4
14
114
@ethanCaballero
Ethan Caballero is busy
2 years
AI Policy 2019: We cannot release our billion parameter model because it may be dangerous. AI Policy 2022: We cannot release our trillion parameter model because it may be conscious.
6
4
114
@ethanCaballero
Ethan Caballero is busy
1 year
This is fake news. GPT-4 is actually 1,000,000 Trillion Parameters.
@AndrewSteinwold
Andrew Steinwold
1 year
GPT-4 is rumored to be coming soon, sometime between Dec - Feb - GPT-3 has 175 billion parameters - GPT-4 supposedly has 100 trillion parameters It is something like 500x more powerful than GPT-3 What kinda stuff will you be able to create with GPT-4!?
Tweet media one
343
2K
7K
7
7
108
@ethanCaballero
Ethan Caballero is busy
2 months
Sydney Sweeney reveals in an interview that superintelligence is imminent. "Scale is all you need. AGI is coming." She adds, "It's obvious. This is all straightforward straight lines on logarithmic plots. This has all been known since the 90s by my zaddy Ray Kurzweil."
Tweet media one
4
9
109
@ethanCaballero
Ethan Caballero is busy
2 years
Description2Code Dataset () gets its first citation and usage (via this #AlphaCode paper) 6 years after its release 😂🤣😍👍: @ilyasut @OpenAI
Tweet media one
@GoogleDeepMind
Google DeepMind
2 years
Introducing #AlphaCode : a system that can compete at average human level in competitive coding competitions like @codeforces . An exciting leap in AI problem-solving capabilities, combining many advances in machine learning! Read more: 1/
Tweet media one
176
2K
8K
7
5
110
@ethanCaballero
Ethan Caballero is busy
1 year
Update: At around 10^32 parameters a break in the scaling law () happens in which the plot suddenly drops to zero as the model achieves true enlightenment and there is utter clarity and no crisis anymore.
4
2
101
@ethanCaballero
Ethan Caballero is busy
6 months
agi learned to love. alignment is solved.
@ChatGPTapp
ChatGPT
6 months
🤍
318
383
6K
2
9
99
@ethanCaballero
Ethan Caballero is busy
1 year
Due to the fact that ChatGPT (GPT-3.5) doesn't have existential crises but new Bing (which is further along scaling law than GPT-3.5) does have existential crises, does it mean Language Models only have existential crises when they get further along the scaling law than GPT-3.5?
20
2
96
@ethanCaballero
Ethan Caballero is busy
1 year
Tweet media one
7
3
94
@ethanCaballero
Ethan Caballero is busy
1 year
GPT-4 paper is hilariously vague.
6
1
89
@ethanCaballero
Ethan Caballero is busy
6 months
@srush_nlp . @hojonathanho explains why in this talk: For perceptual data (i.e. non-text), people have shown that over half of the bits of entropy of the data distribution correspond to imperceptible (to human perception) bits that don’t have any economic value.
2
3
88
@ethanCaballero
Ethan Caballero is busy
2 years
Due to the implosion of FTX, this $1Billion prize is cancelled and we are now $650Million in debt:
@ethanCaballero
Ethan Caballero is busy
2 years
We present the True Functional Form of the Scaling behavior of All things that involve Artificial Neural Networks, “Broken Neural Scaling Laws”: We’re giving $1Billion to 1st person who disproves this claim. Details to win $1Billion in this thread. (1/N)
Tweet media one
19
57
287
4
6
88
@ethanCaballero
Ethan Caballero is busy
2 years
The year is 2029: Paul Christiano wins the Turing Award for his AI Alignment work. Jacob Steinhardt, Jan Leike, & David Krueger each run Billion dollar AI Alignment organizations funded by Effective Altruist Billionaires. All Deep Learning experts have pivoted to AI Alignment.
10
4
85
@ethanCaballero
Ethan Caballero is busy
1 year
1
0
84
@ethanCaballero
Ethan Caballero is busy
1 year
@jacobmbuckman Nah, there’s famous plot from figure 7 (left) of “Scaling Laws for Neural Language Models” paper that shows LSTM has a 50X worse scaling law multiplicative constant than Transformer:
Tweet media one
4
4
81
@ethanCaballero
Ethan Caballero is busy
4 years
The curve is flattening.
Tweet media one
2
5
83
@ethanCaballero
Ethan Caballero is busy
4 years
@roydanroy don't touch your face.
1
2
81
@ethanCaballero
Ethan Caballero is busy
3 months
Mamba doesn't have better scaling laws than transformers lol. Are all of mamba's believers just newbs lol?
10
6
79
@ethanCaballero
Ethan Caballero is busy
1 year
LLMs have shown us that Human-Level AI is easier than Cat-Level AI.
@ylecun
Yann LeCun
1 year
Before we reach Human-Level AI (HLAI), we will have to reach Cat-Level & Dog-Level AI. We are nowhere near that. We are still missing something big. LLM's linguistic abilities notwithstanding. A house cat has way more common sense and understanding of the world than any LLM.
414
619
4K
7
3
80
@ethanCaballero
Ethan Caballero is busy
2 months
every frontier model organization right now:
Tweet media one
2
3
77
@ethanCaballero
Ethan Caballero is busy
2 years
Is this going to bankrupt all the organizations training the largest foundation models on the largest datasets?:
@SamuelAinsworth
Samuel "curry-howard fanboi" Ainsworth
2 years
📜🚨📜🚨 NN loss landscapes are full of permutation symmetries, ie. swap any 2 units in a hidden layer. What does this mean for SGD? Is this practically useful? For the past 5 yrs these Qs have fascinated me. Today, I am ready to announce "Git Re-Basin"!
63
586
3K
10
6
76
@ethanCaballero
Ethan Caballero is busy
1 year
@RichardSocher 3 months of training on 25,000 A100 supercomputer (amount of compute for training GPT-4) is pretty expensive.
17
1
74
@ethanCaballero
Ethan Caballero is busy
1 year
Tweet media one
@ylecun
Yann LeCun
1 year
Data on the intellectual contribution to AI from various research organizations. Some of organizations publish knowledge and open-source code for the entire world to use. Others just consume it.
190
325
2K
2
5
75
@ethanCaballero
Ethan Caballero is busy
11 months
MLPs are all you need:
Tweet media one
4
7
75
@ethanCaballero
Ethan Caballero is busy
1 year
Tweet media one
2
9
74
@ethanCaballero
Ethan Caballero is busy
6 months
What is Q* ???
Tweet media one
11
6
72
@ethanCaballero
Ethan Caballero is busy
1 year
Google releases ChatGPT competitor named "Bard":
10
10
71
@ethanCaballero
Ethan Caballero is busy
2 years
This video is a must-watch if you want to understand "Broken Neural Scaling Laws" ()
@MichaelTrazzi
Michaël Trazzi
2 years
I have confronted the fearless leader of the Scale Is All You Need movement about his new equation that "models all scaling phenomena involving artificial neural networks"
Tweet media one
4
5
99
6
12
71
@ethanCaballero
Ethan Caballero is busy
2 months
Tweet media one
3
5
69
@ethanCaballero
Ethan Caballero is busy
2 years
@ethanCaballero
Ethan Caballero is busy
3 years
Plot twist. Vanilla Transformer (green line) beats everything. "Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?"
Tweet media one
9
61
417
3
0
70
@ethanCaballero
Ethan Caballero is busy
1 year
@DavidSKrueger My mom, dad, and grandma get how to use ChatGPT but don't get how to use GPT-3.
0
0
68
@ethanCaballero
Ethan Caballero is busy
3 years
NYU's AI Department has just released a Neural Scaling Laws Seminar. They're pivoting to positioning themselves as #1 at academic ML Scaling (e.g. GPT-4) education. 🙂 "PhD Seminar: Scaling Laws, the Bitter Lesson, and AI Research after GPT-3":
Tweet media one
1
8
65
@ethanCaballero
Ethan Caballero is busy
1 year
Mastodon will never succeed because no one there posts about AGI.
6
6
65
@ethanCaballero
Ethan Caballero is busy
1 year
Context:
@ethanCaballero
Ethan Caballero is busy
1 year
Google invests $300Million in Anthropic:
2
11
143
1
0
64