Subham Sahoo
@ssahoo_
Followers
2K
Following
883
Media
30
Statuses
432
Pioneering Diffusion LLMs. @cornell PhD. Previously: @GoogleAI; @IITKgp.
New York, USA
Joined June 2010
🚨 “The Diffusion Duality” is out! @ICML2025 ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: https://t.co/0RKsd8NJfB 💻 Code: https://t.co/oYE9hDYrGI 🧠
16
102
542
Please fill out your availability for the reading group
As we get started with our discrete diffusion reading group, we’d like to schedule a recurring one-hour meeting time that works for everyone. Form: https://t.co/B4PiXvKbkj > Please fill out your availability in the Google form , and be sure to select your local timezone when
1
0
13
The term AGI gives me the same ick that “AI” did back in 2015. If it takes hundreds of billions of tokens just to get a respectable score on grade school math (GSM8K), that says everything about where we actually are.
0
1
15
We’re building a space that connects researchers, students, and practitioners working on discrete diffusion. Join the Discord — collaborate, learn, and share! Whether you’re 💼hiring or showcasing your work, this is the place 👇 Discord:
discord.com
Check out the Discrete Diffusion Reading Group community on Discord - hang out with 19 other members and enjoy free voice and text chat.
The Discrete Diffusion Reading Group is growing — 400+ members strong! We’ve launched a Discord for discussions, research ideas, help, and job opportunities. Join the conversation 👇 💬 https://t.co/qw6h26OGU5 📧 https://t.co/kV9efqB43W
0
8
107
Overwhelmed by the number of Diffusion LLM papers? 🌊 Same here 😭 So I’m starting a Discrete Diffusion Reading Group (@diffusion_llms) with my favorite disciples @jdeschena and @zhihanyang_ ✨ We’ll cover everything—from theory to empirics, from language to molecules. Join
20
40
316
Drowning in the sea of Discrete Diffusion papers? 🌊 We got you. Join our Reading Group! From theory → empirics, and language → molecules — we’ll decode the chaos together 💫 Join the cult—uh, I mean community 😇 👉 Google Group: https://t.co/kV9efqBBTu (1 / 2)
1
7
23
🔥 Rethinking Reasoning (with Diffusion LLMs) This work changes how you think about reasoning in LLMs. 🤯 Turns out: you don’t need the full chain-of-thought — only a small subset of CoT tokens actually matter for the final answer. ❌ Autoregressive LLMs can’t exploit this
10
36
230
✨Masked Diffusion Language Models✨ are great for reasoning, but not just for the reasons you think! Fast parallel decoding? 🤔 Any-order decoding? 🤨 Plot twist: MDLMs offer A LOT MORE for inference and post-training! 🎢🧵
4
35
162
Impressive work by @jdeschena ! They propose to replace the Encoder only denoising transformer with an Encoder-Decoder architecture which leads to faster training and inference of MDLM.
📢 « Partition Generative Modeling (PGM): Masked Modeling without Masks » is out! 🚯 Masked diffusion models waste FLOPs processing countless mask tokens that carry no real information. ⚡We show how partitioning can replace masking, boosting throughput by >5.3x on text and up
1
4
52
Funny enough, after we released MDLM last year, @srush_nlp came up with the exact same idea!
(1/5) Beyond Next-Token Prediction, introducing Next Semantic Scale Prediction! Our @NeurIPSConf NeurIPS 2025 paper HDLM is out! Check out the new language modeling paradigm: Next Semantic Scale Prediction via Hierarchical Diffusion Language Models. It largely generalizes
1
0
18
✨ Masked Generative Models (MGMs) are powerful and can generate tokens in parallel. They’ve driven impressive results across text and images and are increasingly competitive with autoregressive (AR) models. Thrilled to share our latest work to accelerate MGMs (1/12) 🧵
2
12
34
We’re dropping “The Diffusion Duality, Chapter 2” soon! So, stay tuned 🤗
In diffusion LMs, discrete methods have all but displaced continuous ones (🥲). Interesting new trend: why not both? Use continuous methods to make discrete diffusion better. Diffusion duality: https://t.co/KPO56vDygp CADD: https://t.co/CNOIWcUIMo CCDD:
0
7
81
🎓 Officially a doctor now 😊!!! As a first-gen college kid, this moment means the world to me. Grateful beyond words to all my mentors who’ve guided me along the way — from @GMartius who first introduced me to research back in 2017, to @volokuleshov who sparked my love for
83
57
2K
Happening tomorrow at 2:30pm ET / 11:30 am PT
📢 Excited to defend my PhD thesis: "Foundations of Diffusion Language Models" 🎓✨ 📅 October 3 | 11:30 am PT / 2:30 pm ET 🔗Zoom: https://t.co/PgHvs4s5UT Topics covered: 1⃣ MDLM 2⃣The Diffusion Duality 3⃣Esoteric Language Models
2
0
21
🍷Imagine you are the boss of Google DeepMind. To train the best diffusion language model in world within 1 year, using 800 TPU pods, which model size will you go for? 🐿️ We build Quokka to help you decide–the first-ever large-scale scaling law for DLMs. Interesting facts: 1.
6
58
287
Eternally grateful to my committee members: @jwthickstun (Chair), @Jimantha , Bart Selman
0
0
1
Esoteric Language Models: First paper to propose KV-caching for Masked diffusion models without compromising with parallel generation https://t.co/UKh6GsazIb
🚨 [New paper alert] Esoteric Language Models (Eso-LMs) First Diffusion LM to support KV caching w/o compromising parallel generation. 🔥 Sets new SOTA on the sampling speed–quality Pareto frontier 🔥 🚀 65× faster than MDLM ⚡ 4× faster than Block Diffusion 📜 Paper:
1
0
0