Jason Liu @JasonLiu106968 X Profile

Jason Liu

@JasonLiu106968

Followers

74

Following

8

Media

4

Statuses

13

Joined August 2025

Don't wanna be here? Send us removal request.

Jason Liu

@JasonLiu106968

14 days

Excited to share our #RL_for_LLM paper: "Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning" . We conducted a comprehensive analysis of RL techniques in LLM domain!🥳 .Surprisingly, we found that using only 2 techniques can unlock the learning capability of LLMs.😮

7

27

153

Jason Liu

@JasonLiu106968

11 days

really happy to see our paper help researchers to understand RL4LLM techniques!🥰.

Andrew Liao

@andrewliao11

11 days

Was studying various GRPO variants here and there, DAPO, Dr.GRPO, GSPO, etc. and this paper provides a holistic view of PPO-GRPO family.

0

6

Jason Liu

@JasonLiu106968

11 days

In our paper, we found that the combination of Group mean and batch std exhibited better performance than GRPO's style Norm. Could this be explained from the perspective of calibration? 🤔.

0

Jason Liu

@JasonLiu106968

11 days

This thread presents a remarkably impressive and in-depth explanation of an interesting finding in our "tricks or traps" paper — why group-level and batch-level norms show differences under various reward scales. Learn a lot from two experts!.

Dan Roy

@roydanroy

13 days

RL community query: Why would reward scale make batch vs group advantage normalization behave so differently in RL for LLM reasoning? 🧐🔍. I'm reading “Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning” (arXiv:2508.08221v1). They report big swings when switching.

1

0

Jason Liu

@JasonLiu106968

11 days

RT @natolambert: Really nice RLVR paper.

0

42

0

Jason Liu

@JasonLiu106968

14 days

Final remark:.Full Paper: [. We aim to provide clear, practical guidance on choosing RL techniques—call it "Deep RL that Matters" in the era of large models. ROLL team is continuously improving our frameworks to better serve the RL4LLM community.💪.

arxiv.org

Reinforcement learning for LLM reasoning has rapidly emerged as a prominent research area, marked by a significant surge in related studies on both algorithmic innovations and practical...

0

2

15

Jason Liu

@JasonLiu106968

14 days

👉 3/3.- Overlong filtering shows limited effectiveness on long-tail reasoning tasks but enhances the accuracy in medium and short-length reasoning tasks. (fig.15). - Unlock LLM reasoning pattern with Lite PPO, which only involves mix normalization and token-level loss! (fig.16)

0

7

Jason Liu

@JasonLiu106968

14 days

👉 2/3.• Clip Higher prefers promoting exploration for aligned models! (Fig.9).• It seems like a "scaling law" between the acc. and the upper clip bound on the small model. (Fig.10).• Token-level loss aggregation is more effective on Base LLM than on aligned LLMs. (Fig.13)

0

1

9

Jason Liu

@JasonLiu106968

14 days

👀 1/3 .Key discoveries:.• Advantage normalizations have their own preferred setup! (fig. 4) .• Removing the std when reward distributions are highly concentrated enhances the training stability. (fig. 6).• Mix normalization further enhances the learning efficiency! (fig. 7)

0

3

10