rish2k1 Profile Banner
Rishabh Tiwari Profile
Rishabh Tiwari

@rish2k1

Followers
804
Following
140
Media
14
Statuses
65

RS Intern @Meta | CS PhD @UCBerkeley | Ex-@GoogleAI | Research area: Efficient and robust AI systems

Berkeley, CA
Joined May 2019
Don't wanna be here? Send us removal request.
@rish2k1
Rishabh Tiwari
22 days
There is so much noise in the LLM RL space, so we sat down and ran everything at scale (so you dont have to ๐Ÿ˜œ) and presenting to you โ€œThe Art of Scaling RLโ€ Give this a read before starting your next RL run. Led by amazing @Devvrit_Khatri @lovish
@Devvrit_Khatri
Devvrit
22 days
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
3
20
220
@rach_it_
Rachit Bansal
16 days
Excited to share one of the first projects from my PhD! We find that Adam (often seen as approximate second-order) can actually outperform Gauss-Newton (true second-order) in certain cases! Our 2x2 comparison across basis choice and gradient noise is revealing! Thread by Sham:
@ShamKakade6
Sham Kakade
17 days
(1/9) Diagonal preconditioners such as Adam typically use empirical gradient information rather than true second-order curvature. Is this merely a computational compromise or can it be advantageous? Our work confirms the latter: Adam can outperform Gauss-Newton in certain cases.
2
14
107
@MiniMax__AI
MiniMax (official)
19 days
Great to see our algorithmic work validated at scale! CISPO started as a stability fix during our MiniMax-01 training, an answer to spiky gradients and train-inference discrepancies. Seeing it become a core component of ScaleRL in The Art of Scaling Reinforcement Learning
@deedydas
Deedy
22 days
Meta just dropped this paper that spills the secret sauce of reinforcement learning (RL) on LLMs. It lays out an RL recipe, uses 400,000 GPU hrs and posits a scaling law for performance with more compute in RL, like the classic pretraining scaling laws. Must read for AI nerds.
3
8
88
@ShamKakade6
Sham Kakade
17 days
(1/9) Diagonal preconditioners such as Adam typically use empirical gradient information rather than true second-order curvature. Is this merely a computational compromise or can it be advantageous? Our work confirms the latter: Adam can outperform Gauss-Newton in certain cases.
2
18
129
@DeltaInstitutes
Delta Institute
19 days
Huge thanks to Devvrit Khatri for coming on the Delta Podcast! Check out the podcast episode here: https://t.co/wmsDjqFbPn
2
2
7
@jainprateek_
Prateek Jain
22 days
This work provides many deep insights into scaling RL for LLMs! Congratulations @Devvrit_Khatri @louvishh and all the coauthors. Also amazing to see so many close friends and collaborators, including four of our former predocs/RF write this nice paper.
@Devvrit_Khatri
Devvrit
22 days
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
0
4
56
@agarwl_
Rishabh Agarwal
22 days
*checks chatgpt* This paper costs ~4.2 million USD (400K GB200 hours) -- science! Our most expensive run was a 100K GPU hour (same amount as Deepseek-R1-zero but on GB200s). One finding here was that once we have a scalable RL algorithm, RL compute scaling becomes predictable
@Devvrit_Khatri
Devvrit
22 days
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
19
73
835
@louvishh
lovish
22 days
๐Ÿšจ New Paper: The Art of Scaling Reinforcement Learning Compute for LLMs ๐Ÿšจ We burnt a lot of GPU-hours to provide the community with the first open, large-scale systematic study on RL scaling for LLMs. https://t.co/49REQZ4R6G
@Devvrit_Khatri
Devvrit
22 days
Wish to build scaling laws for RL but not sure how to scale? Or what scales? Or would RL even scale predictably? We introduce: The Art of Scaling Reinforcement Learning Compute for LLMs
2
15
67
@agarwl_
Rishabh Agarwal
23 days
Sneak peak from a paper about scaling RL compute for LLMs: probably the most compute-expensive paper I've worked on, but hoping that others can run experiments cheaply for the science of scaling RL. Coincidentally, this is similar motivation to what we had for the NeurIPS best
11
37
417
@dylan522p
Dylan Patel
3 months
Feel like I'm taking crazy pills. We are just back at step one. Donโ€™t store KV cache, just recompute it.
@adityastomar_
Aditya Tomar
3 months
Can we break the memory wall for LLM inference via KV cache rematerialization? ๐Ÿšจ Introducing XQuant, which leverages underutilized compute units to eliminate the memory bottleneck for LLM inference! โ€ข 10โ€“12.5x memory savings vs. FP16 โ€ข Near-zero accuracy loss โ€ข Beats
29
23
540
@adityastomar_
Aditya Tomar
3 months
Can we break the memory wall for LLM inference via KV cache rematerialization? ๐Ÿšจ Introducing XQuant, which leverages underutilized compute units to eliminate the memory bottleneck for LLM inference! โ€ข 10โ€“12.5x memory savings vs. FP16 โ€ข Near-zero accuracy loss โ€ข Beats
26
92
667
@nikitasaxena02
Nikita Saxena (she/her)
3 months
Heading to @COLM_conf in Montreal? So is @WiMLworkshop! ๐ŸŽ‰ We are organizing our first ever event at #CoLM2025 and we want you to choose the format! What excites you the most? Have a different idea? Let us know in the replies! ๐Ÿ‘‡ RT to spread the word! โฉ
1
9
37
@LakshyAAAgrawal
Lakshya A Agrawal
3 months
How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trialsโ€”by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!๐Ÿงต
46
170
1K
@ShikharSSU
Shikhar
4 months
Meows, music, murmurs and more! We train a general purpose audio encoder and open source the code, checkpoints and evaluation toolkit.
@ArxivSound
arXiv Sound
4 months
Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe, "OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder,"
0
15
35
@mertcemri
Mert Cemri
4 months
If you are at ICML this year, make sure to catch @rish2k1 at the Efficient Systems for Foundation Models Workshop at east exhibition hall A to learn more about our work on accelerating test-time scaling methods to achieve better latency/accuracy tradeoffs!
@abeirami
Ahmad Beirami
4 months
At es-fomo workshop, talk to @rish2k1 about scaling test-time compute as a function of user-facing latency (instead of FLOPS)
0
3
21
@abeirami
Ahmad Beirami
4 months
At es-fomo workshop, talk to @rish2k1 about scaling test-time compute as a function of user-facing latency (instead of FLOPS)
@abeirami
Ahmad Beirami
4 months
[Sat Jul 19] @Nived_Rajaraman & @rish2k1 present work on improving accuracy-latency tradeoffs for test-time scaling. @gh_aminian presents work showing that a smoothened version of best-of-n gives improves reward vs KL tradeoffs when a low-quality proxy reward is used.
1
3
24
@rish2k1
Rishabh Tiwari
4 months
๐ŸšจCome check out our poster at #ICML2025! QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache ๐Ÿ“ East Exhibition Hall A-B โ€” #E-2608 ๐Ÿ—“๏ธ Poster Session 5 | Thu, Jul 17 | ๐Ÿ•“ 11:00 AM โ€“1:30 PM TLDR: Use a quantized version of the same model as its own draft
@rish2k1
Rishabh Tiwari
9 months
๐Ÿš€ Fast and accurate Speculative Decoding for Long Context? ๐Ÿ”ŽProblem: ๐Ÿ”นStandard speculative decoding struggles with long-context generation, as current draft models are pretty weak for long context ๐Ÿ”นFinding the right draft model is tricky, as compatibility varies across
0
8
37
@rish2k1
Rishabh Tiwari
5 months
Really interesting work by my friend @harman26singh on making reward models more robust, effectively reducing reliance on spurious attributes
@Harman26Singh
Harman Singh
5 months
๐Ÿšจ New @GoogleDeepMind paper ๐‘๐จ๐›๐ฎ๐ฌ๐ญ ๐‘๐ž๐ฐ๐š๐ซ๐ ๐Œ๐จ๐๐ž๐ฅ๐ข๐ง๐  ๐ฏ๐ข๐š ๐‚๐š๐ฎ๐ฌ๐š๐ฅ ๐‘๐ฎ๐›๐ซ๐ข๐œ๐ฌ ๐Ÿ“‘ ๐Ÿ‘‰ https://t.co/oCk5jGNYlj We tackle reward hackingโ€”when RMs latch onto spurious cues (e.g. length, style) instead of true quality. #RLAIF #CausalInference ๐Ÿงตโฌ‡๏ธ
0
1
7
@andykonwinski
Andy Konwinski
5 months
If you had 15min to tell thousands of Berkeley CS/Data/Stats grads what to do with their lives, what would you say? Last Thursday I told them to RUN AT FAILURE. Afterwards, while we were shaking hands & taking selfies, hundreds of them told me that they are excited to go fail. I
18
32
270
@HaochengXiUCB
Haocheng Xi
6 months
Excited to share that our paper Quantspec has been accepted to #ICML2025! Huge thanks to my collaborators! Paper:
Tweet card summary image
arxiv.org
Large Language Models (LLMs) are increasingly being deployed on edge devices for long-context settings, creating a growing need for fast and efficient long-context inference. In these scenarios,...
@rish2k1
Rishabh Tiwari
9 months
๐Ÿš€ Fast and accurate Speculative Decoding for Long Context? ๐Ÿ”ŽProblem: ๐Ÿ”นStandard speculative decoding struggles with long-context generation, as current draft models are pretty weak for long context ๐Ÿ”นFinding the right draft model is tricky, as compatibility varies across
0
6
41