Yaroslav Bulatov Profile
Yaroslav Bulatov

@yaroslavvb

Followers
8K
Following
1K
Media
223
Statuses
2K

[email protected] (ex-Google Brain, OpenAI, Meta) New Blog: https://t.co/SLix8Hrt4w Old Blog: https://t.co/Ur3GWKpmp6

San Francisco, CA
Joined February 2011
Don't wanna be here? Send us removal request.
@yaroslavvb
Yaroslav Bulatov
25 days
@makingAGI The only change I needed to do was to replace FlashAttention with PyTorch's built-in scaled_dot_product_attention
0
0
4
@yaroslavvb
Yaroslav Bulatov
25 days
Playing with @makingAGI's HRM implementation this morning, and it's a really promising sign when the official version runs in colab with almost no modifications
Tweet card summary image
colab.research.google.com
Colab notebook
1
0
22
@grok
Grok
29 days
The most fun image & video creation tool in the world is here. Try it for free in the Grok App.
0
218
2K
@yaroslavvb
Yaroslav Bulatov
2 months
There are many different fixed-point updates that converge to the polar factor of A (closest orthogonal matrix in Frobenius distance). A very simple shrinkage-like update: A=(1+e)(I- e A A')A
1
3
26
@yaroslavvb
Yaroslav Bulatov
2 months
The root issue is that peer-review is a left-over artifact from the time when papers were physically printed on paper. When the peer reviewer realizes how little value they add, they try to spend as little effort as possible.
1
0
1
@yaroslavvb
Yaroslav Bulatov
3 months
This is a nice explanation on why reasoning emerges as an unexpected side effect of training for text compression (but not video compression)
@svlevine
Sergey Levine
3 months
I always found it puzzling how language models learn so much from next-token prediction, while video models learn so little from next frame prediction. Maybe it's because LLMs are actually brain scanners in disguise. Idle musings in my new blog post:
5
2
7
@yaroslavvb
Yaroslav Bulatov
3 months
Was brushing up on transforms for some scaling laws math, summarized here https://t.co/QN66xkAugq
Tweet media one
Tweet media two
1
4
19
@yaroslavvb
Yaroslav Bulatov
4 months
came across this overview by Derek Lowe on the state of AI drugs a year ago @ACSCentSci
Tweet card summary image
science.org
AI Drugs So Far
0
0
3
@yaroslavvb
Yaroslav Bulatov
4 months
Enjoyed @jxbz thought-provoking talk on optimizers at @ml_collective today. Are theories that motivate optimizers very useful? Adversarial for AdaGrad, natural gradient for KFAC. Non-linear solvers in scientific computing seem to advance without spending a lot of effort thinking
1
1
14
@yaroslavvb
Yaroslav Bulatov
4 months
Once everyone online is indistinguishable from an AI agent, it would make it cool again to hang out in person. Until the robot impersonators.
1
0
22
@yaroslavvb
Yaroslav Bulatov
4 months
Unexpected RMT observation, squared singular values of a product of random projections are essentially distributed as exponentiated chi-squared, can anyone see a direct explanation of this? https://t.co/C8AcJbo1mz
Tweet media one
1
2
11
@yaroslavvb
Yaroslav Bulatov
4 months
Watching @liuzhuang1234's - "Transformers without Normalization", this slide is a reminder how our optimizer and architecture choices are coupled
Tweet media one
6
19
157
@m_ryabinin
Max Ryabinin
5 months
I'm giving a talk at the MCDC🤝 workshop (#ICLR2025) tomorrow! Planning to cover: * An overview of decentralized DL & its links to other fields * Lessons learned from research on Learning@home, DeDLOC, SWARM, Petals * Sneak peek on some of our upcoming work! See you at 14:30!
Tweet media one
3
9
54
@yaroslavvb
Yaroslav Bulatov
6 months
0
0
2
@yaroslavvb
Yaroslav Bulatov
6 months
Re-reading Horace's https://t.co/Z4b8eqA6oR, suggests that one could estimate the total number of transistor flips by integrating over power+frequency graphs ... has anyone checked if this works on the entries reported by nvidia-smi?
Tweet media one
2
0
7
@yaroslavvb
Yaroslav Bulatov
7 months
Keeping up with headline news, which are often negative, makes it easy to lose track of the big picture
@SteveStuWill
Steve Stewart-Williams
7 months
How the world has changed over the last century. A compilation of some of our greatest accomplishments as a species. Credit: @toddrjones
1
0
21
@yaroslavvb
Yaroslav Bulatov
7 months
Are there recorded talks I can watch relevant to DeepSeek?
2
0
5
@yaroslavvb
Yaroslav Bulatov
7 months
From a talk by Chris Manning
@atroyn
anton 🇺🇸
7 months
'we're in this bizarre world where the best way to learn about llms... is to read papers by chinese companies. i do not think this is a good state of the world' - us labs keeping their architectures and algorithms secret is ultimately hurting ai development in the us.
0
3
16
@yaroslavvb
Yaroslav Bulatov
8 months
@anissagardizy8 this could make for a good retrospective -- "where are they now?"
1
0
5