Siyan Zhao @siyan_zhao X Profile

Siyan Zhao

@siyan_zhao

Followers

3K

Following

1K

Media

46

Statuses

167

CS PhD @UCLA | prev intern @AIatMeta, @Amazon agi | interested in RL, diffusion LLMs | bachelors @uoft

https://t.co/oXamTqkQi9

Los Angeles, CA

Joined January 2019

Don't wanna be here? Send us removal request.

Siyan Zhao

@siyan_zhao

7 months

Introducing d1🚀 — the first framework that applies reinforcement learning to improve reasoning in masked diffusion LLMs (dLLMs). Combining masked SFT with a novel form of policy gradient algorithm, d1 significantly boosts the performance of pretrained dLLMs like LLaDA.

8

107

574

Aditya Grover

@adityagrover_

8 days

In many ways, (continuous) diffusion models are in-place reasoners where the quality improves with more denoising steps. Lately, we have been extending this to language, combining RLVR with discrete diffusion, resulting in d1 ( https://t.co/it22yjfjhd, NeurIPS2025 spotlight).

arxiv.org

Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated...

Hieu Pham

@hyhieu226

9 days

Naive question, so please roast me. Why don't we have diffusion reasoning models? The way humans think look a lot more like diffusion than autoregressive.

6

17

228

Daniel Israel

@danielmisrael

27 days

"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.

7

46

316

Chenyu (Monica) Wang

@ChenyuW64562111

1 month

Introducing SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models We propose a new policy gradient algorithm, SPG, for diffusion large language models. SPG improves the accuracy over the previous state-of-the-art RL methods by 3.6% in GSM8K, 2.6% in MATH500, 18.4%

3

30

130

Hritik Bansal

@hbXNov

1 month

New paper 📢 Most powerful vision-language (VL) reasoning datasets remain proprietary 🔒, hindering efforts to study their principles and develop similarly effective datasets in the open 🔓. Thus, we introduce HoneyBee, a 2.5M-example dataset created through careful data

5

39

201

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

1 month

SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models "we propose the Sandwiched Policy Gradient (SPG) that leverages both an upper and a lower bound of the true log-likelihood." "SPG improves the accuracy over state-of-the-art RL methods for dLLMs by 3.6% in

7

22

148

Shufan (Jack) Li

@li78658171

2 months

(1/n) We are excited to announce LaViDa-O, a state-of-the-art unified diffusion LM for image understanding, generation, and editing. Building on our NeurIPS Spotlight submission LaViDa, LaViDa-O offers up to 6.8x speed compared with AR mdoels with high output quality.

2

14

49

Dinghuai Zhang 张鼎怀

@zdhnarsil

2 months

Finally, we scale GFlowNets to 32B param LLMs on reasoning tasks. Kudos to @zhu_xuekai 👏 My two cents: (a) On-policy is important for RL performance (this relates to our previous FlashRL effort) (b) Length normalization in logp calculation is critical for numerical stability

AK

@_akhaliq

2 months

FlowRL Matching Reward Distributions for LLM Reasoning

2

15

122

Daniel Israel

@danielmisrael

2 months

🔦Adaptive Parallel Decoding (APD) has been accepted as a spotlight paper at @NeurIPSConf ! I thank my collaborators, reviewers, and program organizers for this honor. A thread for those interested 🧵 (1/n)

11

23

171

Aditya Grover

@adityagrover_

2 months

.@siyan_zhao's latest work highlights a fundamental inefficiency in GRPO when the group contains all-wrong rewards. With diffusion LLMs, this can be vast mitigated via using inpainted reasoning traces.

Siyan Zhao

@siyan_zhao

2 months

Thanks AK for sharing our work! Unlike autoregressive LLMs, diffusion LLMs can be conditioned on future reasoning hints during generation through inpainting 🧩, enabling guided exploration toward correct solutions. We show that applying inpainting-guided exploration in RL

1

3

32

Siyan Zhao

@siyan_zhao

2 months

Thanks AK for sharing our work! Unlike autoregressive LLMs, diffusion LLMs can be conditioned on future reasoning hints during generation through inpainting 🧩, enabling guided exploration toward correct solutions. We show that applying inpainting-guided exploration in RL

AK

@_akhaliq

2 months

Inpainting-Guided Policy Optimization for Diffusion Large Language Models

4

28

190

Mihir Prabhudesai

@mihirp98

4 months

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

127

197

1K

Tung Nguyen

@tungnd_13

4 months

🚀 Introducing PhysiX: One of the first large-scale foundation models for physics simulations! PhysiX is a 4.5B parameter model that unifies a wide range of physical systems, from fluid dynamics to reaction-diffusion, outperforming specialized, state-of-the-art models.

21

252

2K

Sansa Gong

@sansa19739319

5 months

🤖Can diffusion models write code competitively? Excited to share our latest 7B coding diffusion LLM!!💻 With DiffuCoder, we explore how they decode, why temperature🔥 matters, and how to improve them via coupled-GRPO that speaks diffusion!!📈 Code: https://t.co/sWsb8a49HL 🧵

5

112

584

Shufan (Jack) Li

@li78658171

5 months

(1/6)Our work Reflect-DiT was accepted to #ICCV2025 ! Reflect-DiT allows the model to reflect on its past generations and textual feedback to self-correct and improve, extending reasoning to text-to-image generation.

1

23

95

Aditya Grover

@adityagrover_

5 months

Since our launch earlier this year, we are thrilled to witness the growing community around dLLMs. The Mercury tech report from @InceptionAILabs is now on @arxiv with more extensive evaluations: https://t.co/DnDxFvoX0E New model updates dropping later this week!

arxiv.org

We present Mercury, a new generation of commercial-scale large language models (LLMs) based on diffusion. These models are parameterized via the Transformer architecture and trained to predict...

3

40

254

Xinyu Yang

@Xinyu2ML

5 months

🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level

Infini-AI-Lab

@InfiniAILab

5 months

🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: https://t.co/J9osByhWUf 🧵 1/n

3

20

72

Hritik Bansal

@hbXNov

6 months

🧑‍🍳Very excited to present LaViDa, one of the first diffusion language models for multimodal understanding! 🌟Unlike autoregressive LMs, you can control the speed-quality tradeoff, and solve constrained generation problems out of the box 📦 🌟 We also release LaViDa-Reason, a

Shufan (Jack) Li

@li78658171

6 months

📢(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data? We are excited to announce LaViDa, one of the first and fastest large diffusion LM for vision-language understanding!!

0

30

102

Shufan (Jack) Li

@li78658171

6 months

📢(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data? We are excited to announce LaViDa, one of the first and fastest large diffusion LM for vision-language understanding!!

3

40

168

Songlin Yang

@SonglinYang4

7 months

starting now

2

65

chang ma

@ma_chang_nlp

7 months

We are kicking off a series of seminars at @hkunlp2020. @siyan_zhao will be giving a talk titled "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning" at ⏰Friday 5.9 11am HKT (Thursday 5.8 8pm PDT). Link to talk: https://t.co/i9FsWYRNbZ

0

14

38