
Aviral Kumar
@aviral_kumar2
Followers
4K
Following
912
Media
195
Statuses
344
Assistant Professor of CS & ML at @CarnegieMellon. Part-time Research Scientist Google. PhD from UC Berkeley.
Pittsburgh, PA
Joined May 2016
Checkout these awesome new real-robot online RL fine-tuning results that @andy_peng05 and @zhiyuan_zhou_ got with our WSRL method. WSRL appeared at ICLR earlier this year -- check this out for more details: 👇.
We tested WSRL (Warm-start RL) on a Franka Robot, and it leads to really efficient online RL fine-tuning in the real world!. WSRL learned the peg insertion task perfectly with only 11 minutes of warmup and *7 minutes* of online RL interactions 👇🧵
0
4
49
RT @setlur_amrith: Since R1 there has been a lot of chatter 💬 on post-training LLMs with RL. Is RL only sharpening the distribution over co….
0
24
0
Given the confusion around what RL does for reasoning in LLMs, @setlur_amrith & I wrote a new blog post on when RL simply sharpens the base model & when it discovers new reasoning strategies. Learn how to measure discovery + methods to enable it ⬇️.
4
37
268
And this work would not have been possible without the new CMU FLAME cluster ( with 256 H100 GPUs!.
I'd like to announce that the CMU FLAME center ( has a new cluster!. It is 256 H100 GPUs, which we'll use to perform larger experiments, build more useful artifacts, and continue our tradition of open research. Expect to see more like this in the future👇.
0
0
7
RT @matthewyryang: 🚨 NEW PAPER: What if LLMs could tackle harder problems - not by explicitly training on longer traces, but by learning ho….
0
2
0
RT @setlur_amrith: Introducing e3 🔥 Best <2B model on math 💪.Are LLMs implementing algos ⚒️ OR is thinking an illusion 🎩.? Is RL only sharp….
0
20
0
Checkout's @setlur_amrith's post for more details, including a discussion of how in-context exploration is different from work that claims RL only "sharpens" around the base model's capabilities (largely due to the data/budgets being trained upon).
Introducing e3 🔥 Best <2B model on math 💪.Are LLMs implementing algos ⚒️ OR is thinking an illusion 🎩.? Is RL only sharpening the base LLM distrib. 🤔 OR discovering novel strategies outside base LLM 💡? We answer these ⤵️.🚨 🚨
1
0
4
This was a very fun collab, led by @setlur_amrith & @matthewyryang w/ @ianwu97 @sea_snell @JeremyGreerOumi @gingsmith and @max_simchowitz. I learned a lot!. Website: Paper: Code, training data, ckpts. are all released.
1
0
7
Our view on test-time scaling has been to train models to discover algos that enable them to solve harder problems. @setlur_amrith & @matthewyryang's new work e3 shows how RL done with this view produces best <2B LLM on math that extrapolates beyond training budget. 🧵⬇️
2
28
182
This was a fun collab led by @JunhongShen1 @jackbai_jkb, w/ @LunjunZhang @YifeiZhou02 @setlur_amrith @atalwalkar and many others!. Paper: Website: Code: Please reach out if you have feedback!.
1
0
4