Yaroslav Bulatov @yaroslavvb X Profile

Yaroslav Bulatov

@yaroslavvb

Followers

8K

Following

1K

Media

223

Statuses

2K

[email protected] (ex-Google Brain, OpenAI, Meta) New Blog: https://t.co/SLix8Hrt4w Old Blog: https://t.co/Ur3GWKpmp6

https://t.co/3WhQ70c7aG

San Francisco, CA

Joined February 2011

Don't wanna be here? Send us removal request.

Yaroslav Bulatov

@yaroslavvb

25 days

@makingAGI The only change I needed to do was to replace FlashAttention with PyTorch's built-in scaled_dot_product_attention

0

4

Yaroslav Bulatov

@yaroslavvb

25 days

Playing with @makingAGI's HRM implementation this morning, and it's a really promising sign when the official version runs in colab with almost no modifications

colab.research.google.com

Colab notebook

1

0

22

Grok

@grok

29 days

The most fun image & video creation tool in the world is here. Try it for free in the Grok App.

0

218

2K

Yaroslav Bulatov

@yaroslavvb

2 months

There are many different fixed-point updates that converge to the polar factor of A (closest orthogonal matrix in Frobenius distance). A very simple shrinkage-like update: A=(1+e)(I- e A A')A

1

3

26

Yaroslav Bulatov

@yaroslavvb

2 months

The root issue is that peer-review is a left-over artifact from the time when papers were physically printed on paper. When the peer reviewer realizes how little value they add, they try to spend as little effort as possible.

1

0

1

Yaroslav Bulatov

@yaroslavvb

3 months

This is a nice explanation on why reasoning emerges as an unexpected side effect of training for text compression (but not video compression)

Sergey Levine

@svlevine

3 months

I always found it puzzling how language models learn so much from next-token prediction, while video models learn so little from next frame prediction. Maybe it's because LLMs are actually brain scanners in disguise. Idle musings in my new blog post:

5

2

7

Yaroslav Bulatov

@yaroslavvb

3 months

Was brushing up on transforms for some scaling laws math, summarized here https://t.co/QN66xkAugq

1

4

19

Yaroslav Bulatov

@yaroslavvb

4 months

came across this overview by Derek Lowe on the state of AI drugs a year ago @ACSCentSci

science.org

AI Drugs So Far

0

3

Yaroslav Bulatov

@yaroslavvb

4 months

Enjoyed @jxbz thought-provoking talk on optimizers at @ml_collective today. Are theories that motivate optimizers very useful? Adversarial for AdaGrad, natural gradient for KFAC. Non-linear solvers in scientific computing seem to advance without spending a lot of effort thinking

1

14

Yaroslav Bulatov

@yaroslavvb

4 months

Once everyone online is indistinguishable from an AI agent, it would make it cool again to hang out in person. Until the robot impersonators.

1

0

22

Yaroslav Bulatov

@yaroslavvb

4 months

Unexpected RMT observation, squared singular values of a product of random projections are essentially distributed as exponentiated chi-squared, can anyone see a direct explanation of this? https://t.co/C8AcJbo1mz

1

2

11

Yaroslav Bulatov

@yaroslavvb

4 months

Watching @liuzhuang1234's - "Transformers without Normalization", this slide is a reminder how our optimizer and architecture choices are coupled

6

19

157

Max Ryabinin

@m_ryabinin

5 months

I'm giving a talk at the MCDC🤝 workshop (#ICLR2025) tomorrow! Planning to cover: * An overview of decentralized DL & its links to other fields * Lessons learned from research on Learning@home, DeDLOC, SWARM, Petals * Sneak peek on some of our upcoming work! See you at 14:30!

3

9

54

Yaroslav Bulatov

@yaroslavvb

6 months

@cHHillee

0

2

Yaroslav Bulatov

@yaroslavvb

6 months

Re-reading Horace's https://t.co/Z4b8eqA6oR, suggests that one could estimate the total number of transistor flips by integrating over power+frequency graphs ... has anyone checked if this works on the entries reported by nvidia-smi?

2

0

7

Yaroslav Bulatov

@yaroslavvb

7 months

Keeping up with headline news, which are often negative, makes it easy to lose track of the big picture

Steve Stewart-Williams

@SteveStuWill

7 months

How the world has changed over the last century. A compilation of some of our greatest accomplishments as a species. Credit: @toddrjones

1

0

21

Yaroslav Bulatov

@yaroslavvb

7 months

Are there recorded talks I can watch relevant to DeepSeek?

2

0

5

Yaroslav Bulatov

@yaroslavvb

7 months

From a talk by Chris Manning

anton 🇺🇸

@atroyn

7 months

'we're in this bizarre world where the best way to learn about llms... is to read papers by chinese companies. i do not think this is a good state of the world' - us labs keeping their architectures and algorithms secret is ultimately hurting ai development in the us.

0

3

16

Yaroslav Bulatov

@yaroslavvb

8 months

@anissagardizy8 this could make for a good retrospective -- "where are they now?"

1

0

5