Simon Shaolei Du Profile
Simon Shaolei Du

@SimonShaoleiDu

Followers
8K
Following
6K
Media
14
Statuses
516

Assistant Professor @uwcse. Postdoc @the_IAS. PhD in machine learning @mldcmu.

Seattle, WA
Joined September 2017
Don't wanna be here? Send us removal request.
@SimonShaoleiDu
Simon Shaolei Du
28 days
EM is a classic, but it can fail even for 3-component Gaussian mixtures. Why is EM wide used? Over-parameterization!.We prove (gradient) EM + over-parameterization = global optima for Gaussian mixtures.Paper: Key ideas: Hermite polynomials & tensor decomp
Tweet media one
14
36
239
@SimonShaoleiDu
Simon Shaolei Du
12 days
RT @pareshrc: 1/6 Current AI agent training methods fail to capture diverse behaviors needed for human-AI cooperation. GOAT (Generative Onl….
0
7
0
@SimonShaoleiDu
Simon Shaolei Du
16 days
RT @avibose22: 🚨 Code is live! Check out LoRe – a modular, lightweight codebase for personalized reward modeling from user preferences. 📦 F….
0
6
0
@SimonShaoleiDu
Simon Shaolei Du
26 days
RT @ypwang61: I'll present StoryEval tomorrow at CVPR, happy to catch up with new and old friends!. 📍ExHall D, Poster #284 .⌚10.30am - 12.3….
0
3
0
@SimonShaoleiDu
Simon Shaolei Du
28 days
Joint work with @MoZhou_7 , @Weihang_X , Maryam Fazel.
0
0
2
@SimonShaoleiDu
Simon Shaolei Du
28 days
Check out our new work using online multi-agent RL for LM safety.
@mickel_liu
Mickel Liu
28 days
🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat.🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵
Tweet media one
1
2
20
@SimonShaoleiDu
Simon Shaolei Du
30 days
RT @kjha02: Oral @icmlconf !!! Can't wait to share our work and hear the community's thoughts on it, should be a fun talk!. Can't thank my….
0
3
0
@SimonShaoleiDu
Simon Shaolei Du
1 month
RT @uwcse: Congratulations to @UW #UWAllen Ph.D. grads @sharma_ashish_2 & @sewon__min, @TheOfficialACM Doctoral Dissertation Award honorees….
0
19
0
@SimonShaoleiDu
Simon Shaolei Du
1 month
PPO vs. DPO? 🤔.Our new paper proves that it depends on whether your models can represent the optimal policy and/or reward. Paper: Led by @smellycat_ZZZ @MinhakSong.
@smellycat_ZZZ
Ruizhe Shi
1 month
Two-stage RLHF or one-stage DPO: Which one is better for learning from preferences?. Equal under strong assumptions, but representation differences break the tie. Our paper reveals their fine-grained performance gaps under various conditions. paper:
Tweet media one
0
18
97
@SimonShaoleiDu
Simon Shaolei Du
1 month
Our new paper tries to uncover what we really need in applying RLVR.
@StellaLisy
Stella Li
1 month
🤯 We cracked RLVR with. Random Rewards?!.Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:.- Random rewards: +21%.- Incorrect rewards: +25%.- (FYI) Ground-truth rewards: + 28.8%.How could this even work⁉️ Here's why: 🧵.Blogpost:
Tweet media one
0
0
19
@SimonShaoleiDu
Simon Shaolei Du
2 months
Even with the same vision encoder, generative VLMs (LLaVA) can extract more information than CLIP. Why? Check out our #ACL2025NLP paper led by @SitingLi627 :
@SitingLi627
Siting Li
2 months
Excited to share that our paper "Exploring How Generative MLLMs Perceive More Than CLIP with the Same Vision Encoder" is accepted to #ACL2025! .Preprint: Thank @SimonShaoleiDu and @PangWeiKoh so much for your support and guidance throughout the journey!.
1
2
17
@SimonShaoleiDu
Simon Shaolei Du
2 months
RT @shaneguML: Famous LLM researcher Bruce Lee quote: "I fear not the LLM who has practiced 10,000 questions once, but I fear the LLM who h….
0
87
0
@SimonShaoleiDu
Simon Shaolei Du
2 months
RT @kjha02: So excited to announce our work was accepted as a Spotlight paper to @icmlconf !!! I'm looking forward to presenting our work t….
0
10
0
@SimonShaoleiDu
Simon Shaolei Du
2 months
Excited to share our work led by @ypwang61 .RLVR with only ONE training example can boost 37% accuracy on MATH500.
@ypwang61
Yiping Wang
2 months
We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks!. 📍RLVR with one training example can boost:. - Qwen2.5-Math-1.5B: 36.0% → 73.6%. - Qwen2.5-Math-7B: 51.0% → 79.2% . on MATH500. 📄 Paper:
Tweet media one
2
6
49
@SimonShaoleiDu
Simon Shaolei Du
3 months
RT @avibose22: 🧠 Your LLM should model how you think, not reduce you to preassigned traits.📢 Introducing LoRe: a low-rank reward modeling f….
0
26
0
@SimonShaoleiDu
Simon Shaolei Du
3 months
Sampler is crucial for faster convergence of online DPO! Check out out paper: #ICLR2025.
@smellycat_ZZZ
Ruizhe Shi
3 months
Previous works study the sample complexity of DPO and emphasize the role of samplers in online DPO. What about its role in optimization convergence rates?. Check out our paper at #ICLR2025 on convergence rates of online DPO with various samplers!. ArXiv:
Tweet media one
0
3
23
@SimonShaoleiDu
Simon Shaolei Du
3 months
Excited to share our new work led by @kjha02 : scaling training to more diverse environments is key to human-AI cooperation!.
@kjha02
Kunal Jha
3 months
Our new paper (first one of my PhD!) on cooperative AI reveals a surprising insight: Environment Diversity > Partner Diversity. Agents trained in self-play across many environments learn cooperative norms that transfer to humans on novel tasks. �
Tweet media one
0
0
16
@SimonShaoleiDu
Simon Shaolei Du
3 months
RT @VectorZhou: 🧠 Ever notice how LLMs struggle with familiar knowledge in unfamiliar formats? Our new paper "CASCADE Your Datasets for Cro….
0
8
0
@SimonShaoleiDu
Simon Shaolei Du
3 months
RT @jxwuyi: 🎉 Milestone Release! AReaL-boba, our latest #RL system! #AI.• data/code/model ALL🔥 #OPENSOURCE.• Full #….
0
39
0
@SimonShaoleiDu
Simon Shaolei Du
4 months
Very nice blog on multi-distribution learning.
@ericzhao28
Eric Zhao
4 months
For more details, read the blog! 🔗.
0
0
5