
Quanquan Gu
@QuanquanGu
Followers
15K
Following
34K
Media
127
Statuses
2K
Professor @UCLA, Pretraining and Scaling at ByteDance Seed | Recent work: SPIN, SPPO, DPLM, GPM, CryoFM, MARS, TPA, RPG | Opinions are my own
Los Angeles, CA
Joined August 2017
The RPG is out. Make KL-regularized Policy Gradient Correct Again! No more GRPO or Reinforce++ — their objectives and KL regularization are inherently inconsistent.
1/6 We introduce RPG, a principled framework for deriving and analyzing KL-regularized policy gradient methods, unifying GRPO/k3-estimator and REINFORCE++ under this framework and discovering better RL objectives than GRPO:.Paper: Code:
0
23
256
RT @ClementDelangue: Every tech company can and should train their own deepseek R1, Llama or GPT5, just like every tech company writes thei….
0
277
0
RT @rohanpaul_ai: The paper proves that softmax attention hides an infinite bundle of mini RNNs and that bundle drives its edge. Right now….
0
25
0
RT @Kangwook_Lee: 🧵When training reasoning models, what's the best approach? SFT, Online RL, or perhaps Offline RL?. At @Krafton_AI and @SK….
0
31
0
RT @SimonXinDong: It turns out,. > GRPO is performing the arithmetic mean --> token-level scaling.> GSPO is performing the geometric mean -….
0
68
0
RT @aryopg: New Anthropic Research: “Inverse Scaling in Test-Time Compute”. We found cases where longer reasoning leads to lower accuracy.….
0
175
0
RT @hkproj: Mistral started it.DeepSeek scaled it.Kimi K2 confirmed it: always more convenient to train an MoE
0
54
0
RT @SherylHsu02: The model solves these problems without tools like lean or coding, it just uses natural language, and also only has 4.5 ho….
0
39
0
Wait… this model didn’t even use Lean? That’s insane. Big congrats to the @OpenAI team. That’s incredible work!.
We achieved gold medal level performance on this year's IMO! Our model thinks and writes proofs in clear, plain‑English - no formal code required. Unlike the narrower systems used in past competitions, our model is built to reason broadly, far beyond contest problems.
6
11
375
RT @MParakhin: Since nobody asked :-), here is my list of papers not to be missed from ICML:.1) Dion: distributed orthonormalized updates (….
0
32
0
RT @BachFrancis: Tired of lengthy computations to derive scaling laws? This post is made for you: discover the sharpness of the z-transform….
0
39
0
RT @ZhaoQingyue: Drop by our poster in Ballroom A, West Building to check our cute analysis techniques and a rich set of future directions….
0
2
0
μP plays a central role in scaling large language models, known for hyperparameter transfer & stability. But don’t overlook its feature learning power. 📈.
Excited to share our work at #ICML2025! 🚀 We dive into how deep L-layer NNs under μP can learn rich features & guarantee global convergence. w/@TheGregYang , @ZhaoQingyue and @QuanquanGu . Check the paper at: Poster Thursday at 11 am! 👇 [1/4]
0
1
20