Siyan Zhao
@siyan_zhao
Followers
3K
Following
1K
Media
46
Statuses
167
CS PhD @UCLA | prev intern @AIatMeta, @Amazon agi | interested in RL, diffusion LLMs | bachelors @uoft
Los Angeles, CA
Joined January 2019
Introducing d1🚀 — the first framework that applies reinforcement learning to improve reasoning in masked diffusion LLMs (dLLMs). Combining masked SFT with a novel form of policy gradient algorithm, d1 significantly boosts the performance of pretrained dLLMs like LLaDA.
8
107
574
In many ways, (continuous) diffusion models are in-place reasoners where the quality improves with more denoising steps. Lately, we have been extending this to language, combining RLVR with discrete diffusion, resulting in d1 ( https://t.co/it22yjfjhd, NeurIPS2025 spotlight).
arxiv.org
Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated...
Naive question, so please roast me. Why don't we have diffusion reasoning models? The way humans think look a lot more like diffusion than autoregressive.
6
17
228
"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.
7
46
316
Introducing SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models We propose a new policy gradient algorithm, SPG, for diffusion large language models. SPG improves the accuracy over the previous state-of-the-art RL methods by 3.6% in GSM8K, 2.6% in MATH500, 18.4%
3
30
130
New paper 📢 Most powerful vision-language (VL) reasoning datasets remain proprietary 🔒, hindering efforts to study their principles and develop similarly effective datasets in the open 🔓. Thus, we introduce HoneyBee, a 2.5M-example dataset created through careful data
5
39
201
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models "we propose the Sandwiched Policy Gradient (SPG) that leverages both an upper and a lower bound of the true log-likelihood." "SPG improves the accuracy over state-of-the-art RL methods for dLLMs by 3.6% in
7
22
148
(1/n) We are excited to announce LaViDa-O, a state-of-the-art unified diffusion LM for image understanding, generation, and editing. Building on our NeurIPS Spotlight submission LaViDa, LaViDa-O offers up to 6.8x speed compared with AR mdoels with high output quality.
2
14
49
Finally, we scale GFlowNets to 32B param LLMs on reasoning tasks. Kudos to @zhu_xuekai 👏 My two cents: (a) On-policy is important for RL performance (this relates to our previous FlashRL effort) (b) Length normalization in logp calculation is critical for numerical stability
2
15
122
🔦Adaptive Parallel Decoding (APD) has been accepted as a spotlight paper at @NeurIPSConf ! I thank my collaborators, reviewers, and program organizers for this honor. A thread for those interested 🧵 (1/n)
11
23
171
.@siyan_zhao's latest work highlights a fundamental inefficiency in GRPO when the group contains all-wrong rewards. With diffusion LLMs, this can be vast mitigated via using inpainted reasoning traces.
Thanks AK for sharing our work! Unlike autoregressive LLMs, diffusion LLMs can be conditioned on future reasoning hints during generation through inpainting 🧩, enabling guided exploration toward correct solutions. We show that applying inpainting-guided exploration in RL
1
3
32
Thanks AK for sharing our work! Unlike autoregressive LLMs, diffusion LLMs can be conditioned on future reasoning hints during generation through inpainting 🧩, enabling guided exploration toward correct solutions. We show that applying inpainting-guided exploration in RL
4
28
190
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
127
197
1K
🚀 Introducing PhysiX: One of the first large-scale foundation models for physics simulations! PhysiX is a 4.5B parameter model that unifies a wide range of physical systems, from fluid dynamics to reaction-diffusion, outperforming specialized, state-of-the-art models.
21
252
2K
🤖Can diffusion models write code competitively? Excited to share our latest 7B coding diffusion LLM!!💻 With DiffuCoder, we explore how they decode, why temperature🔥 matters, and how to improve them via coupled-GRPO that speaks diffusion!!📈 Code: https://t.co/sWsb8a49HL 🧵
5
112
584
(1/6)Our work Reflect-DiT was accepted to #ICCV2025 ! Reflect-DiT allows the model to reflect on its past generations and textual feedback to self-correct and improve, extending reasoning to text-to-image generation.
1
23
95
Since our launch earlier this year, we are thrilled to witness the growing community around dLLMs. The Mercury tech report from @InceptionAILabs is now on @arxiv with more extensive evaluations: https://t.co/DnDxFvoX0E New model updates dropping later this week!
arxiv.org
We present Mercury, a new generation of commercial-scale large language models (LLMs) based on diffusion. These models are parameterized via the Transformer architecture and trained to predict...
3
40
254
🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: https://t.co/J9osByhWUf 🧵 1/n
3
20
72
🧑🍳Very excited to present LaViDa, one of the first diffusion language models for multimodal understanding! 🌟Unlike autoregressive LMs, you can control the speed-quality tradeoff, and solve constrained generation problems out of the box 📦 🌟 We also release LaViDa-Reason, a
📢(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data? We are excited to announce LaViDa, one of the first and fastest large diffusion LM for vision-language understanding!!
0
30
102
📢(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data? We are excited to announce LaViDa, one of the first and fastest large diffusion LM for vision-language understanding!!
3
40
168
We are kicking off a series of seminars at @hkunlp2020. @siyan_zhao will be giving a talk titled "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning" at ⏰Friday 5.9 11am HKT (Thursday 5.8 8pm PDT). Link to talk: https://t.co/i9FsWYRNbZ
0
14
38