lovish
@louvishh
Followers
967
Following
4K
Media
47
Statuses
265
phding @ucl and msl @aiatmeta. mostly random tweets here.
london
Joined July 2021
🚨 New Paper: The Art of Scaling Reinforcement Learning Compute for LLMs 🚨 We burnt a lot of GPU-hours to provide the community with the first open, large-scale systematic study on RL scaling for LLMs. https://t.co/49REQZ4R6G
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
2
17
76
Had an amazing time on the Delta Podcast about our recent Scaling RL work, future directions, and some fun broader conversation. Thanks for having me on :)
Huge thanks to Devvrit Khatri for coming on the Delta Podcast! Check out the podcast episode here: https://t.co/wmsDjqFbPn
1
4
47
thanks @natolambert for covering our paper! we hope this takes a positive step in building the scaling laws for RL in the future!
My very positive review of the ScaleRL paper. Excited for more work data+base models work to be built around this (Pythia-style olmo suite???). For now, the key things to get off the ground with RL are: Importance sampling, in-flight updates, and continuous batching.
0
1
14
Meta just dropped this paper that spills the secret sauce of reinforcement learning (RL) on LLMs. It lays out an RL recipe, uses 400,000 GPU hrs and posits a scaling law for performance with more compute in RL, like the classic pretraining scaling laws. Must read for AI nerds.
44
214
1K
finding compute for this project (and dealing with new hardware) was such a fun exercise in itself lol. can't believe we spent this much on this paper haha. rl scaling ftw 🙌
*checks chatgpt* This paper costs ~4.2 million USD (400K GB200 hours) -- science! Our most expensive run was a 100K GPU hour (same amount as Deepseek-R1-zero but on GB200s). One finding here was that once we have a scalable RL algorithm, RL compute scaling becomes predictable
1
0
41
For more details, refer to our paper: https://t.co/RZ9padg2zB and blog: https://t.co/BaPJFeYpyH. Work done with an amazing set of folks: @Devvrit_Khatri, @rish2k1, @rach_it_, @dvsaisurya, Manzil Zaheer, @inderjit_ml, @brandfonbrener, and @agarwl_.
arxiv.org
Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training....
0
1
7
Our recipe, ScaleRL, is predictable across model sizes, and also shows scaling on downstream evaluations like AIME. Scaling up generation length, batch size, etc. show higher asymptotic performance as well.
1
1
5
Methods that look strong at smaller compute budgets may underperform at scale. REINFORCE-like objectives (CISPO/ScaleRL) perform better compared to the PPO-style variants (GRPO/DAPO).
1
1
4
Pre-training has mature scaling laws, but stable and predictable recipes for RL do not exist (yet). We take a step in that direction by providing a predictable framework using sigmoidal laws along with a stable recipe, ScaleRL, that shows predictable scaling over 100k GPU-hours.
1
1
5
This is a great paper and a real gift to the open community to surface these ablations. Open RL has been on an interesting path of “reinforce-ification” since R1. GRPO was a PPO like method that was motivated by the need to drop the value network and rely on MC estimates (for
arxiv.org
Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training....
1
10
108
This is the most impressive plot I've seen all year: - Scaling RL not only works, but can be predicted from experiments run with 1/2 the target compute - PipelineRL crushes conventional RL pipelines in terms of compute efficiency - Many small details matter for stability &
12
23
275
The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.
20
196
1K
Sneak peak from a paper about scaling RL compute for LLMs: probably the most compute-expensive paper I've worked on, but hoping that others can run experiments cheaply for the science of scaling RL. Coincidentally, this is similar motivation to what we had for the NeurIPS best
11
37
418
So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!) https://t.co/amUll6inIL Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design @metaai
3
10
51
spotted at the zoo today: maverick and behemoth enjoying the rare london sun
0
0
14
reasoning coming soon 😌
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model
6
6
161
llama 4 is here 🦙🦙
Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
1
0
7
i love how my feed is filled with @zacharynado burns every time a new gemini comes out. probably goes back to hibernation to build the best models again after a day.
1
0
25
📉📉NEW SCALING LAW PHENOMENON 📉📉 We find that knowledge and reasoning exhibit different scaling behaviors! Super excited to finally tell you all about our paper on the compute optimal scaling of skills: https://t.co/SH3YCMyIeG [1/n]
13
171
1K