louvishh Profile Banner
lovish Profile
lovish

@louvishh

Followers
967
Following
4K
Media
47
Statuses
265

phding @ucl and msl @aiatmeta. mostly random tweets here.

london
Joined July 2021
Don't wanna be here? Send us removal request.
@louvishh
lovish
30 days
🚨 New Paper: The Art of Scaling Reinforcement Learning Compute for LLMs 🚨 We burnt a lot of GPU-hours to provide the community with the first open, large-scale systematic study on RL scaling for LLMs. https://t.co/49REQZ4R6G
@Devvrit_Khatri
Devvrit
30 days
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
2
17
76
@Devvrit_Khatri
Devvrit
26 days
Had an amazing time on the Delta Podcast about our recent Scaling RL work, future directions, and some fun broader conversation. Thanks for having me on :)
@DeltaInstitutes
Delta Institute @ NeurIPS
26 days
Huge thanks to Devvrit Khatri for coming on the Delta Podcast! Check out the podcast episode here: https://t.co/wmsDjqFbPn
1
4
47
@louvishh
lovish
24 days
in nyc for a couple of weeks. who should i meet?
4
0
14
@louvishh
lovish
26 days
thanks @natolambert for covering our paper! we hope this takes a positive step in building the scaling laws for RL in the future!
@natolambert
Nathan Lambert
26 days
My very positive review of the ScaleRL paper. Excited for more work data+base models work to be built around this (Pythia-style olmo suite???). For now, the key things to get off the ground with RL are: Importance sampling, in-flight updates, and continuous batching.
0
1
14
@deedydas
Deedy
29 days
Meta just dropped this paper that spills the secret sauce of reinforcement learning (RL) on LLMs. It lays out an RL recipe, uses 400,000 GPU hrs and posits a scaling law for performance with more compute in RL, like the classic pretraining scaling laws. Must read for AI nerds.
44
214
1K
@louvishh
lovish
29 days
finding compute for this project (and dealing with new hardware) was such a fun exercise in itself lol. can't believe we spent this much on this paper haha. rl scaling ftw 🙌
@agarwl_
Rishabh Agarwal
30 days
*checks chatgpt* This paper costs ~4.2 million USD (400K GB200 hours) -- science! Our most expensive run was a 100K GPU hour (same amount as Deepseek-R1-zero but on GB200s). One finding here was that once we have a scalable RL algorithm, RL compute scaling becomes predictable
1
0
41
@louvishh
lovish
30 days
Our recipe, ScaleRL, is predictable across model sizes, and also shows scaling on downstream evaluations like AIME. Scaling up generation length, batch size, etc. show higher asymptotic performance as well.
1
1
5
@louvishh
lovish
30 days
Methods that look strong at smaller compute budgets may underperform at scale. REINFORCE-like objectives (CISPO/ScaleRL) perform better compared to the PPO-style variants (GRPO/DAPO).
1
1
4
@louvishh
lovish
30 days
Pre-training has mature scaling laws, but stable and predictable recipes for RL do not exist (yet). We take a step in that direction by providing a predictable framework using sigmoidal laws along with a stable recipe, ScaleRL, that shows predictable scaling over 100k GPU-hours.
1
1
5
@rosstaylor90
Ross Taylor
30 days
This is a great paper and a real gift to the open community to surface these ablations. Open RL has been on an interesting path of “reinforce-ification” since R1. GRPO was a PPO like method that was motivated by the need to drop the value network and rely on MC estimates (for
Tweet card summary image
arxiv.org
Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training....
1
10
108
@_lewtun
Lewis Tunstall
30 days
This is the most impressive plot I've seen all year: - Scaling RL not only works, but can be predicted from experiments run with 1/2 the target compute - PipelineRL crushes conventional RL pipelines in terms of compute efficiency - Many small details matter for stability &
12
23
275
@natolambert
Nathan Lambert
30 days
The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.
20
196
1K
@agarwl_
Rishabh Agarwal
1 month
Sneak peak from a paper about scaling RL compute for LLMs: probably the most compute-expensive paper I've worked on, but hoping that others can run experiments cheaply for the science of scaling RL. Coincidentally, this is similar motivation to what we had for the NeurIPS best
11
37
418
@_dieuwke_
Dieuwke Hupkes
7 months
So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!) https://t.co/amUll6inIL Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design @metaai
3
10
51
@louvishh
lovish
7 months
spotted at the zoo today: maverick and behemoth enjoying the rare london sun
0
0
14
@louvishh
lovish
7 months
reasoning coming soon 😌
@AIatMeta
AI at Meta
7 months
Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model
6
6
161
@louvishh
lovish
7 months
llama 4 is here 🦙🦙
@Ahmad_Al_Dahle
Ahmad Al-Dahle
7 months
Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4
1
0
7
@louvishh
lovish
8 months
i love how my feed is filled with @zacharynado burns every time a new gemini comes out. probably goes back to hibernation to build the best models again after a day.
1
0
25
@nick11roberts
Nicholas Roberts
8 months
📉📉NEW SCALING LAW PHENOMENON 📉📉 We find that knowledge and reasoning exhibit different scaling behaviors! Super excited to finally tell you all about our paper on the compute optimal scaling of skills: https://t.co/SH3YCMyIeG [1/n]
13
171
1K