siyan_zhao Profile Banner
Siyan Zhao Profile
Siyan Zhao

@siyan_zhao

Followers
3K
Following
1K
Media
46
Statuses
167

CS PhD @UCLA | prev intern @AIatMeta, @Amazon agi | interested in RL, diffusion LLMs | bachelors @uoft

Los Angeles, CA
Joined January 2019
Don't wanna be here? Send us removal request.
@siyan_zhao
Siyan Zhao
7 months
Introducing d1🚀 — the first framework that applies reinforcement learning to improve reasoning in masked diffusion LLMs (dLLMs). Combining masked SFT with a novel form of policy gradient algorithm, d1 significantly boosts the performance of pretrained dLLMs like LLaDA.
8
107
574
@adityagrover_
Aditya Grover
8 days
In many ways, (continuous) diffusion models are in-place reasoners where the quality improves with more denoising steps. Lately, we have been extending this to language, combining RLVR with discrete diffusion, resulting in d1 ( https://t.co/it22yjfjhd, NeurIPS2025 spotlight).
Tweet card summary image
arxiv.org
Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL). These capabilities have primarily been demonstrated...
@hyhieu226
Hieu Pham
9 days
Naive question, so please roast me. Why don't we have diffusion reasoning models? The way humans think look a lot more like diffusion than autoregressive.
6
17
228
@danielmisrael
Daniel Israel
27 days
"An hour of planning can save you 10 hours of doing." ✨📝 Planned Diffusion 📝 ✨ makes a plan before parallel dLLM generation. Planned Diffusion runs 1.2-1.8× faster than autoregressive and an order of magnitude faster than diffusion, while staying within 0.9–5% AR quality.
7
46
316
@ChenyuW64562111
Chenyu (Monica) Wang
1 month
Introducing SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models We propose a new policy gradient algorithm, SPG, for diffusion large language models. SPG improves the accuracy over the previous state-of-the-art RL methods by 3.6% in GSM8K, 2.6% in MATH500, 18.4%
3
30
130
@hbXNov
Hritik Bansal
1 month
New paper 📢 Most powerful vision-language (VL) reasoning datasets remain proprietary 🔒, hindering efforts to study their principles and develop similarly effective datasets in the open 🔓. Thus, we introduce HoneyBee, a 2.5M-example dataset created through careful data
5
39
201
@iScienceLuvr
Tanishq Mathew Abraham, Ph.D.
1 month
SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models "we propose the Sandwiched Policy Gradient (SPG) that leverages both an upper and a lower bound of the true log-likelihood." "SPG improves the accuracy over state-of-the-art RL methods for dLLMs by 3.6% in
7
22
148
@li78658171
Shufan (Jack) Li
2 months
(1/n) We are excited to announce LaViDa-O, a state-of-the-art unified diffusion LM for image understanding, generation, and editing. Building on our NeurIPS Spotlight submission LaViDa, LaViDa-O offers up to 6.8x speed compared with AR mdoels with high output quality.
2
14
49
@zdhnarsil
Dinghuai Zhang 张鼎怀
2 months
Finally, we scale GFlowNets to 32B param LLMs on reasoning tasks. Kudos to @zhu_xuekai 👏 My two cents: (a) On-policy is important for RL performance (this relates to our previous FlashRL effort) (b) Length normalization in logp calculation is critical for numerical stability
@_akhaliq
AK
2 months
FlowRL Matching Reward Distributions for LLM Reasoning
2
15
122
@danielmisrael
Daniel Israel
2 months
🔦Adaptive Parallel Decoding (APD) has been accepted as a spotlight paper at @NeurIPSConf ! I thank my collaborators, reviewers, and program organizers for this honor. A thread for those interested 🧵 (1/n)
11
23
171
@adityagrover_
Aditya Grover
2 months
.@siyan_zhao's latest work highlights a fundamental inefficiency in GRPO when the group contains all-wrong rewards. With diffusion LLMs, this can be vast mitigated via using inpainted reasoning traces.
@siyan_zhao
Siyan Zhao
2 months
Thanks AK for sharing our work! Unlike autoregressive LLMs, diffusion LLMs can be conditioned on future reasoning hints during generation through inpainting 🧩, enabling guided exploration toward correct solutions. We show that applying inpainting-guided exploration in RL
1
3
32
@siyan_zhao
Siyan Zhao
2 months
Thanks AK for sharing our work! Unlike autoregressive LLMs, diffusion LLMs can be conditioned on future reasoning hints during generation through inpainting 🧩, enabling guided exploration toward correct solutions. We show that applying inpainting-guided exploration in RL
@_akhaliq
AK
2 months
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
4
28
190
@mihirp98
Mihir Prabhudesai
4 months
🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n
127
197
1K
@tungnd_13
Tung Nguyen
4 months
🚀 Introducing PhysiX: One of the first large-scale foundation models for physics simulations! PhysiX is a 4.5B parameter model that unifies a wide range of physical systems, from fluid dynamics to reaction-diffusion, outperforming specialized, state-of-the-art models.
21
252
2K
@sansa19739319
Sansa Gong
5 months
🤖Can diffusion models write code competitively? Excited to share our latest 7B coding diffusion LLM!!💻 With DiffuCoder, we explore how they decode, why temperature🔥 matters, and how to improve them via coupled-GRPO that speaks diffusion!!📈 Code: https://t.co/sWsb8a49HL 🧵
5
112
584
@li78658171
Shufan (Jack) Li
5 months
(1/6)Our work Reflect-DiT was accepted to #ICCV2025 ! Reflect-DiT allows the model to reflect on its past generations and textual feedback to self-correct and improve, extending reasoning to text-to-image generation.
1
23
95
@adityagrover_
Aditya Grover
5 months
Since our launch earlier this year, we are thrilled to witness the growing community around dLLMs. The Mercury tech report from @InceptionAILabs is now on @arxiv with more extensive evaluations: https://t.co/DnDxFvoX0E New model updates dropping later this week!
Tweet card summary image
arxiv.org
We present Mercury, a new generation of commercial-scale large language models (LLMs) based on diffusion. These models are parameterized via the Transformer architecture and trained to predict...
3
40
254
@Xinyu2ML
Xinyu Yang
5 months
🚀 Super excited to share Multiverse! 🏃 It’s been a long journey exploring the space between model design and hardware efficiency. What excites me most is realizing that, beyond optimizing existing models, we can discover better model architectures by embracing system-level
@InfiniAILab
Infini-AI-Lab
5 months
🔥 We introduce Multiverse, a new generative modeling framework for adaptive and lossless parallel generation. 🚀 Multiverse is the first open-source non-AR model to achieve AIME24 and AIME25 scores of 54% and 46% 🌐 Website: https://t.co/J9osByhWUf 🧵 1/n
3
20
72
@hbXNov
Hritik Bansal
6 months
🧑‍🍳Very excited to present LaViDa, one of the first diffusion language models for multimodal understanding! 🌟Unlike autoregressive LMs, you can control the speed-quality tradeoff, and solve constrained generation problems out of the box 📦 🌟 We also release LaViDa-Reason, a
@li78658171
Shufan (Jack) Li
6 months
📢(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data? We are excited to announce LaViDa, one of the first and fastest large diffusion LM for vision-language understanding!!
0
30
102
@li78658171
Shufan (Jack) Li
6 months
📢(1/11)Diffusion LMs are fast and controllable at inference time! But why restrict such benefits for processing text data? We are excited to announce LaViDa, one of the first and fastest large diffusion LM for vision-language understanding!!
3
40
168
@SonglinYang4
Songlin Yang
7 months
starting now
2
2
65
@ma_chang_nlp
chang ma
7 months
We are kicking off a series of seminars at @hkunlp2020. @siyan_zhao will be giving a talk titled "d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning" at ⏰Friday 5.9 11am HKT (Thursday 5.8 8pm PDT). Link to talk: https://t.co/i9FsWYRNbZ
0
14
38