jdeschena Profile Banner
Justin Deschenaux Profile
Justin Deschenaux

@jdeschena

Followers
422
Following
4K
Media
35
Statuses
367

PhD student @EPFL advised by @caglarml. Working on diffusion language models ⚡️

Suisse
Joined May 2013
Don't wanna be here? Send us removal request.
@jdeschena
Justin Deschenaux
10 months
🌟 Excited to share our latest work on making diffusion language models (DLMs) faster than autoregressive (AR) models! ⚡ It’s been great to work on this with @caglarml 😎 Lately, DLMs are gaining traction as a promising alternative to autoregressive sequence modeling 👀 1/14 🧵
2
59
265
@jdeschena
Justin Deschenaux
22 days
RT @iScienceLuvr: Inverse Scaling in Test-Time Compute. "We identify five distinct failure modes when models reason for longer: 1) Claude m….
0
35
0
@jdeschena
Justin Deschenaux
29 days
RT @XiuyingWei966: If you’re interested in long-context efficiency, don’t miss our recent paper RAT—a joint effort with @anunay_yadav, Razv….
0
3
0
@jdeschena
Justin Deschenaux
29 days
RT @caglarml: Many people still talk about coming up with alternatives to self-attention, but acknowledging the strengths of both self-atte….
0
3
0
@jdeschena
Justin Deschenaux
1 month
RT @jdeschena: 🔥 NEW PAPER: "The Diffusion Duality". Uniform-state diffusion models for text generation emerges from an underlying continuo….
0
7
0
@jdeschena
Justin Deschenaux
1 month
RT @ssahoo_: Attending ICML ✈️Tues-Fri to present "The Diffusion Duality".🗓️Wed, July 16 @ 4:30pm.📍East Exhibition Hall A-B (E-3003). DM if….
0
17
0
@jdeschena
Justin Deschenaux
1 month
RT @SkanderMoalla: 🚀 Big time! We can finally do LLM RL fine-tuning with rewards and leverage offline/off-policy data!. ❌ You want rewards,….
0
37
0
@jdeschena
Justin Deschenaux
1 month
RT @XiuyingWei966: Curious about making Transformers faster on long sequences without compromising accuracy? ⚡️🧠 Meet RAT—an intermediate d….
0
9
0
@jdeschena
Justin Deschenaux
1 month
RT @johnowhitaker: I did another video, on the paper 'The Diffusion Duality', continuing the series of me trying to understand diffusion ap….
0
32
0
@jdeschena
Justin Deschenaux
2 months
RT @sedielem: This work uncovers a profound connection between continuous and discrete (non-absorbing) diffusion models, allowing transfer….
0
32
0
@jdeschena
Justin Deschenaux
2 months
RT @_akhaliq: The Diffusion Duality. unlock few-step generation in discrete diffusion language models via the underlying Gaussian diffusion….
0
47
0
@jdeschena
Justin Deschenaux
2 months
RT @SkyLi0n: Check out our recent paper on the "duality" between discrete and Gaussian diffusion. We show how you can exploit that relation….
0
5
0
@jdeschena
Justin Deschenaux
2 months
@DrYangSong 🔗 CODE & MODELS:. 📜 Paper: 📘 Blog: 💻 Code: It was amazing to work on this with @ssahoo_, @SkyLi0n, @Guanghan__Wang, @justintchiu, @volokuleshov, onto the next 🚀.9/9 🧵.
Tweet card summary image
github.com
[ICML 2025] The Diffusion Duality. Contribute to s-sahoo/duo development by creating an account on GitHub.
0
0
1
@jdeschena
Justin Deschenaux
2 months
@DrYangSong 🎯 SUMMARY. • Duo bridges continuous and discrete diffusion.• Enables faster training with curriculum learning.• Enables faster sampling by adapting consistency models to discrete spaces.8/9.
1
0
1
@jdeschena
Justin Deschenaux
2 months
@DrYangSong Using argmax at the final step cuts this to 8 steps, with only a slight drop in unigram entropy 🤯 7/9.
1
0
1
@jdeschena
Justin Deschenaux
2 months
@DrYangSong Discrete Consistency Distillation slashes sampling steps by orders of magnitude: after distillation, we can sample in just 16 steps and match the original generative perplexity—without lowering unigram entropy. 6/9
Tweet media one
1
0
1
@jdeschena
Justin Deschenaux
2 months
Importantly, the Diffusion Duality lets us adapt Consistency Models (@DrYangSong) to discrete diffusion! 🔄✨ Even though there is no PF-ODE for discrete spaces, we can use the Gaussian PF-ODE to generate distillation trajectories, then transfer them to the discrete domain! 5/9
Tweet media one
1
0
1
@jdeschena
Justin Deschenaux
2 months
After 1M training steps, Duo sets a new SoTA 🏅 among comparable uniform-state diffusion models, thanks to curriculum learning! 📈 Duo also outperforms AR models on 3 out of 7 zero-shot datasets 🚀 4/9
Tweet media one
Tweet media two
1
0
1
@jdeschena
Justin Deschenaux
2 months
The curriculum learning strategy slashes training steps by 2x by reducing variance: early during training, instead of discrete values, we feed the neural network a low-temperature softmax over the Gaussian diffusion vector, which accelerates training 🚀 3/9
Tweet media one
1
0
1
@jdeschena
Justin Deschenaux
2 months
🔍 THE CONNECTION:. When you apply argmax to vectors computed through Gaussian diffusion, you get uniform-state discrete diffusion. It is quite amazing that such a simple operation connects two seemingly disconnected paradigms 🤯 2/9
Tweet media one
1
0
1