Devvrit Profile
Devvrit

@Devvrit_Khatri

Followers
791
Following
2K
Media
18
Statuses
177

GradStudent@UTCompSci, Meta. Large Scale ML - Scalability and Efficiency. Past: DeepMind.

Joined December 2019
Don't wanna be here? Send us removal request.
@xyh6666
Yuhui Xu
24 hours
Congrats to the Meta team on ScaleRL! Interesting to see it adopt a reasoning-length control mechanism similar to what we introduced in Elastic Reasoning, using forced interruptions (e.g., “”) to improve RL training stability. Exciting to see this idea validated at scale!
4
8
62
@Devvrit_Khatri
Devvrit
8 days
This is a great blog explaining the progress in scaling RL and our work. Pretty clear, intuitive, and captures the key takeaways (and limitations :)). Thanks, @natolambert!
@natolambert
Nathan Lambert
8 days
My very positive review of the ScaleRL paper. Excited for more work data+base models work to be built around this (Pythia-style olmo suite???). For now, the key things to get off the ground with RL are: Importance sampling, in-flight updates, and continuous batching.
0
0
16
@Devvrit_Khatri
Devvrit
8 days
Thanks, @deedydas, for sharing our work. By AI nerds, for AI nerds :)
@deedydas
Deedy
12 days
Meta just dropped this paper that spills the secret sauce of reinforcement learning (RL) on LLMs. It lays out an RL recipe, uses 400,000 GPU hrs and posits a scaling law for performance with more compute in RL, like the classic pretraining scaling laws. Must read for AI nerds.
0
1
13
@Devvrit_Khatri
Devvrit
9 days
Had an amazing time on the Delta Podcast about our recent Scaling RL work, future directions, and some fun broader conversation. Thanks for having me on :)
@DeltaInstitutes
Delta Institute
9 days
Huge thanks to Devvrit Khatri for coming on the Delta Podcast! Check out the podcast episode here: https://t.co/wmsDjqFbPn
1
4
46
@Devvrit_Khatri
Devvrit
11 days
Thanks so much :) Indeed, understanding and the science of doing RL has a long way to go :)
@nileshgupta2797
Nilesh Gupta
12 days
The cleanest RL scaling results I've seen so far🤯. Amazing to see how much valuable insights you can get when the premise is not necessarily to come up with a "new" method and just figure out what works (ofc while also being supercharged with 400K gpu hours). Congratssss
0
0
13
@Devvrit_Khatri
Devvrit
12 days
Thanks, @omarsar0, for the visibility. Pretty great concise summary and interpretation of the work :)
@omarsar0
elvis
12 days
Banger paper from Meta and collaborators. This paper is one of the best deep dives yet on how reinforcement learning (RL) actually scales for LLMs. The team ran over 400,000 GPU hours of experiments to find a predictable scaling pattern and a stable recipe (ScaleRL) that
1
2
17
@Devvrit_Khatri
Devvrit
12 days
Even I am surprised that’s how much we spent 😅 RL becoming predictable is an amazing insight. We now know how to compare two methods. And scaling across all these different axes - shows that RL is indeed embracing the bitter lesson..
@agarwl_
Rishabh Agarwal
12 days
*checks chatgpt* This paper costs ~4.2 million USD (400K GB200 hours) -- science! Our most expensive run was a 100K GPU hour (same amount as Deepseek-R1-zero but on GB200s). One finding here was that once we have a scalable RL algorithm, RL compute scaling becomes predictable
0
0
15
@louvishh
lovish
12 days
🚨 New Paper: The Art of Scaling Reinforcement Learning Compute for LLMs 🚨 We burnt a lot of GPU-hours to provide the community with the first open, large-scale systematic study on RL scaling for LLMs. https://t.co/49REQZ4R6G
@Devvrit_Khatri
Devvrit
12 days
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
2
15
66
@Devvrit_Khatri
Devvrit
12 days
Looking forward to your blog!
@natolambert
Nathan Lambert
12 days
The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.
0
1
7
@Devvrit_Khatri
Devvrit
12 days
Would a larger model with less steps or smaller model with more train steps reach a certain performance faster? Answering such questions (figure in 1st tweet), we see “early sparks” of RL scaling laws in sight.
1
2
14
@Devvrit_Khatri
Devvrit
12 days
Would “scaling” up along generation length/model size/batch-size give expected gains? Absolutely! And now we can analyze how exactly they improve the performance. For example, smaller bsz/gen len may seem better initially, but larger ones overtake eventually.
2
2
18
@Devvrit_Khatri
Devvrit
12 days
Common “tricks” mainly shift efficiency: loss aggregation, normalization, curriculum, etc. Large batch size, large generation length, loss type, off-policy setup, and train/inference kernel mismatch fixes are the most consequential.
1
2
20
@Devvrit_Khatri
Devvrit
12 days
Not all RL methods scale equally well. Some reach higher asymptotic performance than others. Methods that may look promising early on can be worse when extrapolating to a larger compute regime.
1
3
24
@Devvrit_Khatri
Devvrit
12 days
Framework: We fit sigmoidal curves to an iid validation set. Results? (1) We can now predict RL performance at larger scale. (2) We can analyze each algorithmic choice and how it affects the scaling.
1
2
17
@Devvrit_Khatri
Devvrit
12 days
We provide (a) a framework to fit such scaling curves. Using this, we analyze several design choices, and combine the best ones to form our recipe (b) ScaleRL. We demonstrate its effectiveness by predictably scaling to 100k GPU-hours.
1
2
22
@Devvrit_Khatri
Devvrit
12 days
How do we understand the contribution of several design choices in an RL algorithm? Do they make the algorithm efficient? Or do they elevate the asymptotic performance? To study the scaling behavior of each design choice, we need to fit a predictable scaling curve - this provides
1
3
26
@Devvrit_Khatri
Devvrit
12 days
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
10
103
550
@agarwl_
Rishabh Agarwal
13 days
Sneak peak from a paper about scaling RL compute for LLMs: probably the most compute-expensive paper I've worked on, but hoping that others can run experiments cheaply for the science of scaling RL. Coincidentally, this is similar motivation to what we had for the NeurIPS best
11
36
417
@Primevp_in
PrimeVenturePartners
1 month
What happens when one of @GoogleDeepMind's top scientists sits down to unpack AI’s past, present & future? The full episode with @jainprateek_ is here. 🎙 Topics you can’t miss: 🔹 Deep learning → transformers → generative AI 🔹 India’s once-in-a-generation chance to lead in
0
4
13