Devvrit @Devvrit_Khatri X Profile

Devvrit

@Devvrit_Khatri

Followers

791

Following

2K

Media

18

Statuses

177

GradStudent@UTCompSci, Meta. Large Scale ML - Scalability and Efficiency. Past: DeepMind.

https://t.co/S2qU7USeuv

Joined December 2019

Don't wanna be here? Send us removal request.

Yuhui Xu

@xyh6666

24 hours

Congrats to the Meta team on ScaleRL! Interesting to see it adopt a reasoning-length control mechanism similar to what we introduced in Elastic Reasoning, using forced interruptions (e.g., “”) to improve RL training stability. Exciting to see this idea validated at scale!

4

8

62

Devvrit

@Devvrit_Khatri

8 days

This is a great blog explaining the progress in scaling RL and our work. Pretty clear, intuitive, and captures the key takeaways (and limitations :)). Thanks, @natolambert!

Nathan Lambert

@natolambert

8 days

My very positive review of the ScaleRL paper. Excited for more work data+base models work to be built around this (Pythia-style olmo suite???). For now, the key things to get off the ground with RL are: Importance sampling, in-flight updates, and continuous batching.

0

16

Devvrit

@Devvrit_Khatri

8 days

Thanks, @deedydas, for sharing our work. By AI nerds, for AI nerds :)

Deedy

@deedydas

12 days

Meta just dropped this paper that spills the secret sauce of reinforcement learning (RL) on LLMs. It lays out an RL recipe, uses 400,000 GPU hrs and posits a scaling law for performance with more compute in RL, like the classic pretraining scaling laws. Must read for AI nerds.

0

1

13

Devvrit

@Devvrit_Khatri

9 days

Had an amazing time on the Delta Podcast about our recent Scaling RL work, future directions, and some fun broader conversation. Thanks for having me on :)

Delta Institute

@DeltaInstitutes

9 days

Huge thanks to Devvrit Khatri for coming on the Delta Podcast! Check out the podcast episode here: https://t.co/wmsDjqFbPn

1

4

46

Devvrit

@Devvrit_Khatri

11 days

Thanks so much :) Indeed, understanding and the science of doing RL has a long way to go :)

Nilesh Gupta

@nileshgupta2797

12 days

The cleanest RL scaling results I've seen so far🤯. Amazing to see how much valuable insights you can get when the premise is not necessarily to come up with a "new" method and just figure out what works (ofc while also being supercharged with 400K gpu hours). Congratssss

0

13

Devvrit

@Devvrit_Khatri

12 days

Thanks, @omarsar0, for the visibility. Pretty great concise summary and interpretation of the work :)

elvis

@omarsar0

12 days

Banger paper from Meta and collaborators. This paper is one of the best deep dives yet on how reinforcement learning (RL) actually scales for LLMs. The team ran over 400,000 GPU hours of experiments to find a predictable scaling pattern and a stable recipe (ScaleRL) that

1

2

17

Devvrit

@Devvrit_Khatri

12 days

Even I am surprised that’s how much we spent 😅 RL becoming predictable is an amazing insight. We now know how to compare two methods. And scaling across all these different axes - shows that RL is indeed embracing the bitter lesson..

Rishabh Agarwal

@agarwl_

12 days

*checks chatgpt* This paper costs ~4.2 million USD (400K GB200 hours) -- science! Our most expensive run was a 100K GPU hour (same amount as Deepseek-R1-zero but on GB200s). One finding here was that once we have a scalable RL algorithm, RL compute scaling becomes predictable

0

15

lovish

@louvishh

12 days

🚨 New Paper: The Art of Scaling Reinforcement Learning Compute for LLMs 🚨 We burnt a lot of GPU-hours to provide the community with the first open, large-scale systematic study on RL scaling for LLMs. https://t.co/49REQZ4R6G

Devvrit

@Devvrit_Khatri

12 days

Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs

2

15

66

Devvrit

@Devvrit_Khatri

12 days

Looking forward to your blog!

Nathan Lambert

@natolambert

12 days

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.

0

1

7

Devvrit

@Devvrit_Khatri

12 days

Work done at Meta (thanks for the gb200s :p), with awesome collaborators including @louvishh, @rish2k1, @rach_it_, @dvsaisurya, Manzil Zaheer, @inderjit_ml, @brandfonbrener, and @agarwl_ Paper: https://t.co/okiL3xDHuO My blog Link (work in progress):

arxiv.org

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training....

2

6

34

Devvrit

@Devvrit_Khatri

12 days

Would a larger model with less steps or smaller model with more train steps reach a certain performance faster? Answering such questions (figure in 1st tweet), we see “early sparks” of RL scaling laws in sight.

1

2

14

Devvrit

@Devvrit_Khatri

12 days

Would “scaling” up along generation length/model size/batch-size give expected gains? Absolutely! And now we can analyze how exactly they improve the performance. For example, smaller bsz/gen len may seem better initially, but larger ones overtake eventually.

2

18

Devvrit

@Devvrit_Khatri

12 days

Common “tricks” mainly shift efficiency: loss aggregation, normalization, curriculum, etc. Large batch size, large generation length, loss type, off-policy setup, and train/inference kernel mismatch fixes are the most consequential.

1

2

20

Devvrit

@Devvrit_Khatri

12 days

Not all RL methods scale equally well. Some reach higher asymptotic performance than others. Methods that may look promising early on can be worse when extrapolating to a larger compute regime.

1

3

24

Devvrit

@Devvrit_Khatri

12 days

Framework: We fit sigmoidal curves to an iid validation set. Results? (1) We can now predict RL performance at larger scale. (2) We can analyze each algorithmic choice and how it affects the scaling.

1

2

17

Devvrit

@Devvrit_Khatri

12 days

We provide (a) a framework to fit such scaling curves. Using this, we analyze several design choices, and combine the best ones to form our recipe (b) ScaleRL. We demonstrate its effectiveness by predictably scaling to 100k GPU-hours.

1

2

22

Devvrit

@Devvrit_Khatri

12 days

How do we understand the contribution of several design choices in an RL algorithm? Do they make the algorithm efficient? Or do they elevate the asymptotic performance? To study the scaling behavior of each design choice, we need to fit a predictable scaling curve - this provides

1

3

26

Devvrit

@Devvrit_Khatri

12 days

Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs

10

103

550

Rishabh Agarwal

@agarwl_

13 days

Sneak peak from a paper about scaling RL compute for LLMs: probably the most compute-expensive paper I've worked on, but hoping that others can run experiments cheaply for the science of scaling RL. Coincidentally, this is similar motivation to what we had for the NeurIPS best

11

36

417

PrimeVenturePartners

@Primevp_in

1 month

What happens when one of @GoogleDeepMind's top scientists sits down to unpack AI’s past, present & future? The full episode with @jainprateek_ is here. 🎙 Topics you can’t miss: 🔹 Deep learning → transformers → generative AI 🔹 India’s once-in-a-generation chance to lead in

0

4

13