lovish @louvishh X Profile

lovish

@louvishh

Followers

967

Following

4K

Media

47

Statuses

265

phding @ucl and msl @aiatmeta. mostly random tweets here.

https://t.co/ym343FEkbc

london

Joined July 2021

Don't wanna be here? Send us removal request.

lovish

@louvishh

30 days

🚨 New Paper: The Art of Scaling Reinforcement Learning Compute for LLMs 🚨 We burnt a lot of GPU-hours to provide the community with the first open, large-scale systematic study on RL scaling for LLMs. https://t.co/49REQZ4R6G

Devvrit

@Devvrit_Khatri

30 days

Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs

2

17

76

Devvrit

@Devvrit_Khatri

26 days

Had an amazing time on the Delta Podcast about our recent Scaling RL work, future directions, and some fun broader conversation. Thanks for having me on :)

Delta Institute @ NeurIPS

@DeltaInstitutes

26 days

Huge thanks to Devvrit Khatri for coming on the Delta Podcast! Check out the podcast episode here: https://t.co/wmsDjqFbPn

1

4

47

lovish

@louvishh

24 days

in nyc for a couple of weeks. who should i meet?

4

0

14

lovish

@louvishh

26 days

thanks @natolambert for covering our paper! we hope this takes a positive step in building the scaling laws for RL in the future!

Nathan Lambert

@natolambert

26 days

My very positive review of the ScaleRL paper. Excited for more work data+base models work to be built around this (Pythia-style olmo suite???). For now, the key things to get off the ground with RL are: Importance sampling, in-flight updates, and continuous batching.

0

1

14

Deedy

@deedydas

29 days

Meta just dropped this paper that spills the secret sauce of reinforcement learning (RL) on LLMs. It lays out an RL recipe, uses 400,000 GPU hrs and posits a scaling law for performance with more compute in RL, like the classic pretraining scaling laws. Must read for AI nerds.

44

214

1K

lovish

@louvishh

29 days

finding compute for this project (and dealing with new hardware) was such a fun exercise in itself lol. can't believe we spent this much on this paper haha. rl scaling ftw 🙌

Rishabh Agarwal

@agarwl_

30 days

*checks chatgpt* This paper costs ~4.2 million USD (400K GB200 hours) -- science! Our most expensive run was a 100K GPU hour (same amount as Deepseek-R1-zero but on GB200s). One finding here was that once we have a scalable RL algorithm, RL compute scaling becomes predictable

1

0

41

lovish

@louvishh

30 days

For more details, refer to our paper: https://t.co/RZ9padg2zB and blog: https://t.co/BaPJFeYpyH. Work done with an amazing set of folks: @Devvrit_Khatri, @rish2k1, @rach_it_, @dvsaisurya, Manzil Zaheer, @inderjit_ml, @brandfonbrener, and @agarwl_.

arxiv.org

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training....

0

1

7

lovish

@louvishh

30 days

Our recipe, ScaleRL, is predictable across model sizes, and also shows scaling on downstream evaluations like AIME. Scaling up generation length, batch size, etc. show higher asymptotic performance as well.

1

5

lovish

@louvishh

30 days

Methods that look strong at smaller compute budgets may underperform at scale. REINFORCE-like objectives (CISPO/ScaleRL) perform better compared to the PPO-style variants (GRPO/DAPO).

1

4

lovish

@louvishh

30 days

Pre-training has mature scaling laws, but stable and predictable recipes for RL do not exist (yet). We take a step in that direction by providing a predictable framework using sigmoidal laws along with a stable recipe, ScaleRL, that shows predictable scaling over 100k GPU-hours.

1

5

Ross Taylor

@rosstaylor90

30 days

This is a great paper and a real gift to the open community to surface these ablations. Open RL has been on an interesting path of “reinforce-ification” since R1. GRPO was a PPO like method that was motivated by the need to drop the value network and rely on MC estimates (for

arxiv.org

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training....

1

10

108

Lewis Tunstall

@_lewtun

30 days

This is the most impressive plot I've seen all year: - Scaling RL not only works, but can be predicted from experiments run with 1/2 the target compute - PipelineRL crushes conventional RL pipelines in terms of compute efficiency - Many small details matter for stability &

12

23

275

Nathan Lambert

@natolambert

30 days

The first fantastic paper on scaling RL with LLMs just dropped. I strongly recommend taking a look and will be sharing more thoughts on the blog soon. The Art of Scaling Reinforcement Learning Compute for LLMs Khatri & Madaan et al.

20

196

1K

Rishabh Agarwal

@agarwl_

1 month

Sneak peak from a paper about scaling RL compute for LLMs: probably the most compute-expensive paper I've worked on, but hoping that others can run experiments cheaply for the science of scaling RL. Coincidentally, this is similar motivation to what we had for the NeurIPS best

11

37

418

Dieuwke Hupkes

@_dieuwke_

7 months

So happy our new multilingual benchmark MultiLoKo is finally out (after some sweat and tears!) https://t.co/amUll6inIL Multilingual eval for LLMs... could be better, and I hope MultiLoKo will help fill some gaps in it + help study design choices in benchmark design @metaai

3

10

51

lovish

@louvishh

7 months

spotted at the zoo today: maverick and behemoth enjoying the rare london sun

0

14

lovish

@louvishh

7 months

reasoning coming soon 😌

AI at Meta

@AIatMeta

7 months

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

6

161

lovish

@louvishh

7 months

llama 4 is here 🦙🦙

Ahmad Al-Dahle

@Ahmad_Al_Dahle

7 months

Introducing our first set of Llama 4 models! We’ve been hard at work doing a complete re-design of the Llama series. I’m so excited to share it with the world today and mark another major milestone for the Llama herd as we release the *first* open source models in the Llama 4

1

0

7

lovish

@louvishh

8 months

i love how my feed is filled with @zacharynado burns every time a new gemini comes out. probably goes back to hibernation to build the best models again after a day.

1

0

25

Nicholas Roberts

@nick11roberts

8 months

📉📉NEW SCALING LAW PHENOMENON 📉📉 We find that knowledge and reasoning exhibit different scaling behaviors! Super excited to finally tell you all about our paper on the compute optimal scaling of skills: https://t.co/SH3YCMyIeG [1/n]

13

171

1K