Simon Shaolei Du @SimonShaoleiDu X Profile

Simon Shaolei Du

@SimonShaoleiDu

Followers

8K

Following

6K

Media

14

Statuses

516

Assistant Professor @uwcse. Postdoc @the_IAS. PhD in machine learning @mldcmu.

Seattle, WA

Joined September 2017

Don't wanna be here? Send us removal request.

Simon Shaolei Du

@SimonShaoleiDu

28 days

EM is a classic, but it can fail even for 3-component Gaussian mixtures. Why is EM wide used? Over-parameterization!.We prove (gradient) EM + over-parameterization = global optima for Gaussian mixtures.Paper: Key ideas: Hermite polynomials & tensor decomp

14

36

239

Simon Shaolei Du

@SimonShaoleiDu

12 days

RT @pareshrc: 1/6 Current AI agent training methods fail to capture diverse behaviors needed for human-AI cooperation. GOAT (Generative Onl….

0

7

0

Simon Shaolei Du

@SimonShaoleiDu

16 days

RT @avibose22: 🚨 Code is live! Check out LoRe – a modular, lightweight codebase for personalized reward modeling from user preferences. 📦 F….

0

6

0

Simon Shaolei Du

@SimonShaoleiDu

26 days

RT @ypwang61: I'll present StoryEval tomorrow at CVPR, happy to catch up with new and old friends!. 📍ExHall D, Poster #284 .⌚10.30am - 12.3….

0

3

0

Simon Shaolei Du

@SimonShaoleiDu

28 days

Joint work with @MoZhou_7 , @Weihang_X , Maryam Fazel.

0

2

Simon Shaolei Du

@SimonShaoleiDu

28 days

Check out our new work using online multi-agent RL for LM safety.

Mickel Liu

@mickel_liu

28 days

🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat.🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵

1

2

20

Simon Shaolei Du

@SimonShaoleiDu

30 days

RT @kjha02: Oral @icmlconf !!! Can't wait to share our work and hear the community's thoughts on it, should be a fun talk!. Can't thank my….

0

3

0

Simon Shaolei Du

@SimonShaoleiDu

1 month

RT @uwcse: Congratulations to @UW #UWAllen Ph.D. grads @sharma_ashish_2 & @sewon__min, @TheOfficialACM Doctoral Dissertation Award honorees….

0

19

0

Simon Shaolei Du

@SimonShaoleiDu

1 month

PPO vs. DPO? 🤔.Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward. Paper: Led by @smellycat_ZZZ @MinhakSong.

Ruizhe Shi

@smellycat_ZZZ

1 month

Two-stage RLHF or one-stage DPO: Which one is better for learning from preferences?. Equal under strong assumptions, but representation differences break the tie. Our paper reveals their fine-grained performance gaps under various conditions. paper:

0

18

97

Simon Shaolei Du

@SimonShaoleiDu

1 month

Our new paper tries to uncover what we really need in applying RLVR.

Stella Li

@StellaLisy

1 month

🤯 We cracked RLVR with. Random Rewards?!.Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:.- Random rewards: +21%.- Incorrect rewards: +25%.- (FYI) Ground-truth rewards: + 28.8%.How could this even work⁉️ Here's why: 🧵.Blogpost:

0

19

Simon Shaolei Du

@SimonShaoleiDu

2 months

Even with the same vision encoder, generative VLMs (LLaVA) can extract more information than CLIP. Why? Check out our #ACL2025NLP paper led by @SitingLi627 :

Siting Li

@SitingLi627

2 months

Excited to share that our paper "Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder" is accepted to #ACL2025! .Preprint: Thank @SimonShaoleiDu and @PangWeiKoh so much for your support and guidance throughout the journey!.

1

2

17

Simon Shaolei Du

@SimonShaoleiDu

2 months

RT @shaneguML: Famous LLM researcher Bruce Lee quote: "I fear not the LLM who has practiced 10,000 questions once, but I fear the LLM who h….

0

87

0

Simon Shaolei Du

@SimonShaoleiDu

2 months

RT @kjha02: So excited to announce our work was accepted as a Spotlight paper to @icmlconf !!! I'm looking forward to presenting our work t….

0

10

0

Simon Shaolei Du

@SimonShaoleiDu

2 months

Excited to share our work led by @ypwang61 .RLVR with only ONE training example can boost 37% accuracy on MATH500.

Yiping Wang

@ypwang61

2 months

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks!. 📍RLVR with one training example can boost:. - Qwen2.5-Math-1.5B: 36.0% → 73.6%. - Qwen2.5-Math-7B: 51.0% → 79.2% . on MATH500. 📄 Paper:

2

6

49

Simon Shaolei Du

@SimonShaoleiDu

3 months

RT @avibose22: 🧠 Your LLM should model how you think, not reduce you to preassigned traits.📢 Introducing LoRe: a low-rank reward modeling f….

0

26

0

Simon Shaolei Du

@SimonShaoleiDu

3 months

Sampler is crucial for faster convergence of online DPO! Check out out paper: #ICLR2025.

Ruizhe Shi

@smellycat_ZZZ

3 months

Previous works study the sample complexity of DPO and emphasize the role of samplers in online DPO. What about its role in optimization convergence rates?. Check out our paper at #ICLR2025 on convergence rates of online DPO with various samplers!. ArXiv:

0

3

23

Simon Shaolei Du

@SimonShaoleiDu

3 months

Excited to share our new work led by @kjha02 : scaling training to more diverse environments is key to human-AI cooperation!.

Kunal Jha

@kjha02

3 months

Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. �

0

16

Simon Shaolei Du

@SimonShaoleiDu

3 months

RT @VectorZhou: 🧠 Ever notice how LLMs struggle with familiar knowledge in unfamiliar formats? Our new paper "CASCADE Your Datasets for Cro….

0

8

0

Simon Shaolei Du

@SimonShaoleiDu

3 months

RT @jxwuyi: 🎉 Milestone Release! AReaL-boba, our latest #RL system! #AI.• data/code/model ALL🔥 #OPENSOURCE.• Full #….

0

39

0

Simon Shaolei Du

@SimonShaoleiDu

4 months

Very nice blog on multi-distribution learning.

Eric Zhao

@ericzhao28

4 months

For more details, read the blog! 🔗.

0

5