ChenyuW64562111 Profile Banner
Chenyu (Monica) Wang Profile
Chenyu (Monica) Wang

@ChenyuW64562111

Followers
885
Following
397
Media
25
Statuses
106

PhD @MIT_CSAIL | Prev @AIatMeta @genentech @Tsinghua_Uni

Cambridge, MA
Joined September 2022
Don't wanna be here? Send us removal request.
@ChenyuW64562111
Chenyu (Monica) Wang
7 days
Introducing SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models We propose a new policy gradient algorithm, SPG, for diffusion large language models. SPG improves the accuracy over the previous state-of-the-art RL methods by 3.6% in GSM8K, 2.6% in MATH500, 18.4%
3
28
123
@cranialxix
Bo Liu
1 day
Proud to be under Yuandong for the past year : ) Just realized Yd got deactivated before he could see my reply under his badge post: You reminded me why I chose this path: seeking the truth of natural/intelligence is the most beautiful and impactful thing we can do. Yuandong,
@tydsh
Yuandong Tian
2 days
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
3
7
244
@ChenyuW64562111
Chenyu (Monica) Wang
2 days
So sad to hear this. It was a very wonderful time working with Yuandong and the team over the summer. Wish all the best and hope our path will cross once again!
@tydsh
Yuandong Tian
2 days
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
0
0
11
@tydsh
Yuandong Tian
2 days
Several of my team members + myself are impacted by this layoff today. Welcome to connect :)
468
289
6K
@minimario1729
Alex Gu
5 days
✂️Introducing ProofOptimizer: a training and inference recipe for proof shortening! 😰AI-written formal proofs can be long and unreadable: Seed-Prover's proof of IMO '25 P1 is 16x longer in Lean vs. English. Our 7B shortens proofs generated by SoTA models by over 50%! 🧵⬇️
6
35
201
@zhuci19
Cai Zhou
5 days
(1/6) Check out our new paper: Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model A Latent Reasoner! arxiv: https://t.co/ldDqxufyG5 Do diffusion language models (DLMs) need to be discrete? No! We show that continuous diffusion models are more
Tweet card summary image
arxiv.org
Diffusion language models, especially masked discrete diffusion models, have achieved great success recently. While there are some theoretical and primary empirical results showing the advantages...
2
29
101
@tydsh
Yuandong Tian
6 days
🚨🚨Great work from our intern @ChenyuW64562111! Our proposed SPG (Sandwiched Policy Gradient) is based on a very simple intuition: RL has both pos/neg samples and we need to learn with both upper/lower bound of the log-likelihood of the text diffusion model. Strong
@ChenyuW64562111
Chenyu (Monica) Wang
7 days
Introducing SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models We propose a new policy gradient algorithm, SPG, for diffusion large language models. SPG improves the accuracy over the previous state-of-the-art RL methods by 3.6% in GSM8K, 2.6% in MATH500, 18.4%
3
13
113
@ChenyuW64562111
Chenyu (Monica) Wang
7 days
[4/n] Despite being trained under confidence-based semi-AR decoding and block-wise masking, SPG generalizes quite well to different inference strategies, even non semi-AR ones.
1
1
5
@ChenyuW64562111
Chenyu (Monica) Wang
7 days
[3/n] As shown in the reward dynamics throughout training, SPG shows a rapid and steady increase in reward over the optimization steps, further demonstrating its efficiency and robustness.
1
1
4
@ChenyuW64562111
Chenyu (Monica) Wang
7 days
[2/n] SPG consistently outperforms the previous state-of-the-art RL methods to dLLMs across mathematical reasoning (GSM8K, MATH500) and logical reasoning (Countdown, Sudoku) benchmarks. Specifically, SPG improves the accuracy over the previous state-of-the-art by 3.6% in GSM8K,
1
1
3
@ChenyuW64562111
Chenyu (Monica) Wang
7 days
[1/n] dLLMs are emerging as an efficient alternative to autoregressive models due to their ability to decode multiple tokens in parallel. However, applying RL algorithms to dLLMs is challenging because of their intractable log-likelihood. SPG computes a more robust and less
1
1
8
@ChenyuW64562111
Chenyu (Monica) Wang
11 days
Diffusion training can largely benefit from a good representation space. If you enjoy @sainingxie's RAE, you may also check our REED paper👇 In our @NeurIPSConf 2025 paper, we find that such benefit can also comes from the representation of a different (synthetic) modality (eg.
Tweet card summary image
arxiv.org
Diffusion models can be improved with additional guidance towards more effective representations of input. Indeed, prior empirical work has already shown that aligning internal representations of...
@ChenyuW64562111
Chenyu (Monica) Wang
3 months
Excited to share: “Learning Diffusion Models with Flexible Representation Guidance” With my amazing coauthors @zhuci19, @sharut_gupta, @zy27962986, @StefanieJegelka, @stats_stephen, Tommi Jaakkola Paper: https://t.co/wYbm5bAlZv Code: https://t.co/nbO1seYBvp
2
18
158
@Guangxuan_Xiao
Guangxuan Xiao
11 days
Excited to share our new work: StreamingVLM! 🚀 We tackle a major challenge for Vision-Language Models (VLMs): understanding infinite video streams in real-time without latency blowing up or running out of memory. Paper: https://t.co/G0bfwKCdZm Code: https://t.co/HqBoLMcrJF
31
162
1K
@ChenyuW64562111
Chenyu (Monica) Wang
12 days
Thanks for featuring our work!
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
12 days
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models "we propose the Sandwiched Policy Gradient (SPG) that leverages both an upper and a lower bound of the true log-likelihood." "SPG improves the accuracy over state-of-the-art RL methods for dLLMs by 3.6% in
0
1
12
@ChenyuW64562111
Chenyu (Monica) Wang
12 days
Check our NeurIPS 2025 paper on next semantic scale prediction for language modeling 😎 We enable self-correction capability by introducing hierarchical semantic representations between the mask and word token. See more details in @zhuci19’s post and https://t.co/BzJmbrgQHb🎉
Tweet card summary image
arxiv.org
In this paper we introduce Hierarchical Diffusion Language Models (HDLM) -- a novel family of discrete diffusion models for language modeling. HDLM builds on a hierarchical vocabulary where...
@zhuci19
Cai Zhou
12 days
(1/5) Beyond Next-Token Prediction, introducing Next Semantic Scale Prediction! Our @NeurIPSConf NeurIPS 2025 paper HDLM is out! Check out the new language modeling paradigm: Next Semantic Scale Prediction via Hierarchical Diffusion Language Models. It largely generalizes
0
2
22
@zdhnarsil
Dinghuai Zhang 张鼎怀
14 days
Our recent CCDD paper on discrete language modeling is out: 📚Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner https://t.co/ChoCrMuIs3
Tweet card summary image
arxiv.org
Diffusion language models, especially masked discrete diffusion models, have achieved great success recently. While there are some theoretical and primary empirical results showing the advantages...
4
26
112
@sedielem
Sander Dieleman
15 days
In diffusion LMs, discrete methods have all but displaced continuous ones (🥲). Interesting new trend: why not both? Use continuous methods to make discrete diffusion better. Diffusion duality: https://t.co/KPO56vDygp CADD: https://t.co/CNOIWcUIMo CCDD:
Tweet card summary image
arxiv.org
Diffusion language models, especially masked discrete diffusion models, have achieved great success recently. While there are some theoretical and primary empirical results showing the advantages...
@sedielem
Sander Dieleman
2 months
New survey on diffusion language models: https://t.co/SHicf69gxV (via @NicolasPerezNi1). Covers pre/post-training, inference and multimodality, with very nice illustrations. I can't help but feel a bit wistful about the apparent extinction of the continuous approach after 2023🥲
9
73
423
@sharut_gupta
Sharut Gupta
15 days
[1/7] Paired multimodal learning shows that training with text can help vision models learn better image representations. But can unpaired data do the same? Our new work shows that the answer is yes! w/ @shobsund @ChenyuW64562111, Stefanie Jegelka and @phillip_isola
10
52
433
@peholderrieth
Peter Holderrieth
16 days
New work: “GLASS Flows: Transition Sampling for Alignment of Flow and Diffusion Models”. GLASS generates images by sampling stochastic Markov transitions with ODEs - allowing us to boost text-image alignment for large-scale models at inference time! https://t.co/unsuG3mYer [1/7]
3
58
240