Jason Liu Profile
Jason Liu

@JasonLiu106968

Followers
74
Following
8
Media
4
Statuses
13

Joined August 2025
Don't wanna be here? Send us removal request.
@JasonLiu106968
Jason Liu
14 days
Excited to share our #RL_for_LLM paper: "Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning"  . We conducted a comprehensive analysis of RL techniques in LLM domain!🥳 .Surprisingly, we found that using only 2 techniques can unlock the learning capability of LLMs.😮
Tweet media one
7
27
153
@JasonLiu106968
Jason Liu
11 days
really happy to see our paper help researchers to understand RL4LLM techniques!🥰.
@andrewliao11
Andrew Liao
11 days
Was studying various GRPO variants here and there, DAPO, Dr.GRPO, GSPO, etc. and this paper provides a holistic view of PPO-GRPO family.
0
0
6
@JasonLiu106968
Jason Liu
11 days
In our paper, we found that the combination of Group mean and batch std exhibited better performance than GRPO's style Norm. Could this be explained from the perspective of calibration? 🤔.
0
0
0
@JasonLiu106968
Jason Liu
11 days
This thread presents a remarkably impressive and in-depth explanation of an interesting finding in our "tricks or traps" paper — why group-level and batch-level norms show differences under various reward scales. Learn a lot from two experts!.
@roydanroy
Dan Roy
13 days
RL community query: Why would reward scale make batch vs group advantage normalization behave so differently in RL for LLM reasoning? 🧐🔍. I'm reading “Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning” (arXiv:2508.08221v1). They report big swings when switching.
1
0
0
@JasonLiu106968
Jason Liu
11 days
RT @natolambert: Really nice RLVR paper.
0
42
0
@JasonLiu106968
Jason Liu
14 days
Final remark:.Full Paper: [. We aim to provide clear, practical guidance on choosing RL techniques—call it "Deep RL that Matters" in the era of large models. ROLL team is continuously improving our frameworks to better serve the RL4LLM community.💪.
Tweet card summary image
arxiv.org
Reinforcement learning for LLM reasoning has rapidly emerged as a prominent research area, marked by a significant surge in related studies on both algorithmic innovations and practical...
0
2
15
@JasonLiu106968
Jason Liu
14 days
👉 3/3.- Overlong filtering shows limited effectiveness on long-tail reasoning tasks but enhances the accuracy in medium and short-length reasoning tasks. (fig.15). - Unlock LLM reasoning pattern with Lite PPO, which only involves mix normalization and token-level loss! (fig.16)
Tweet media one
Tweet media two
0
0
7
@JasonLiu106968
Jason Liu
14 days
👉 2/3.• Clip Higher prefers promoting exploration for aligned models! (Fig.9).• It seems like a "scaling law" between the acc. and the upper clip bound on the small model. (Fig.10).• Token-level loss aggregation is more effective on Base LLM than on aligned LLMs. (Fig.13)
Tweet media one
Tweet media two
Tweet media three
0
1
9
@JasonLiu106968
Jason Liu
14 days
👀 1/3 .Key discoveries:.• Advantage normalizations have their own preferred setup! (fig. 4) .• Removing the std when reward distributions are highly concentrated enhances the training stability. (fig. 6).• Mix normalization further enhances the learning efficiency! (fig. 7)
Tweet media one
Tweet media two
Tweet media three
0
3
10