Justin Deschenaux @jdeschena X Profile

Justin Deschenaux

@jdeschena

Followers

422

Following

4K

Media

35

Statuses

367

PhD student @EPFL advised by @caglarml. Working on diffusion language models ⚡️

Suisse

Joined May 2013

Don't wanna be here? Send us removal request.

Justin Deschenaux

@jdeschena

10 months

🌟 Excited to share our latest work on making diffusion language models (DLMs) faster than autoregressive (AR) models! ⚡ It’s been great to work on this with @caglarml 😎 Lately, DLMs are gaining traction as a promising alternative to autoregressive sequence modeling 👀 1/14 🧵

2

59

265

Justin Deschenaux

@jdeschena

22 days

RT @iScienceLuvr: Inverse Scaling in Test-Time Compute. "We identify five distinct failure modes when models reason for longer: 1) Claude m….

0

35

0

Justin Deschenaux

@jdeschena

29 days

RT @XiuyingWei966: If you’re interested in long-context efficiency, don’t miss our recent paper RAT—a joint effort with @anunay_yadav, Razv….

0

3

0

Justin Deschenaux

@jdeschena

29 days

RT @caglarml: Many people still talk about coming up with alternatives to self-attention, but acknowledging the strengths of both self-atte….

0

3

0

Justin Deschenaux

@jdeschena

1 month

RT @jdeschena: 🔥 NEW PAPER: "The Diffusion Duality". Uniform-state diffusion models for text generation emerges from an underlying continuo….

0

7

0

Justin Deschenaux

@jdeschena

1 month

RT @OHilliges: Sadly, I am no longer a professor at ETH (@eth_en) due to very severe #longCovid and #MECFS.

ethrat.ch

Der ETH-Rat hat an seiner Sitzung vom 9./10. Juli 2025 Kenntnis genommen vom Rücktritt von Vanessa Wood als Vizepräsidentin der ETH Zürich. Sie verlässt die Schulleitung per Ende Dezember 2025 und...

0

144

0

Justin Deschenaux

@jdeschena

1 month

RT @ssahoo_: Attending ICML ✈️Tues-Fri to present "The Diffusion Duality".🗓️Wed, July 16 @ 4:30pm.📍East Exhibition Hall A-B (E-3003). DM if….

0

17

0

Justin Deschenaux

@jdeschena

1 month

RT @SkanderMoalla: 🚀 Big time! We can finally do LLM RL fine-tuning with rewards and leverage offline/off-policy data!. ❌ You want rewards,….

0

37

0

Justin Deschenaux

@jdeschena

1 month

RT @XiuyingWei966: Curious about making Transformers faster on long sequences without compromising accuracy? ⚡️🧠 Meet RAT—an intermediate d….

0

9

0

Justin Deschenaux

@jdeschena

1 month

RT @johnowhitaker: I did another video, on the paper 'The Diffusion Duality', continuing the series of me trying to understand diffusion ap….

0

32

0

Justin Deschenaux

@jdeschena

2 months

RT @sedielem: This work uncovers a profound connection between continuous and discrete (non-absorbing) diffusion models, allowing transfer….

0

32

0

Justin Deschenaux

@jdeschena

2 months

RT @_akhaliq: The Diffusion Duality. unlock few-step generation in discrete diffusion language models via the underlying Gaussian diffusion….

0

47

0

Justin Deschenaux

@jdeschena

2 months

RT @SkyLi0n: Check out our recent paper on the "duality" between discrete and Gaussian diffusion. We show how you can exploit that relation….

0

5

0

Justin Deschenaux

@jdeschena

2 months

@DrYangSong 🔗 CODE & MODELS:. 📜 Paper: 📘 Blog: 💻 Code: It was amazing to work on this with @ssahoo_, @SkyLi0n, @Guanghan__Wang, @justintchiu, @volokuleshov, onto the next 🚀.9/9 🧵.

github.com

[ICML 2025] The Diffusion Duality. Contribute to s-sahoo/duo development by creating an account on GitHub.

0

1

Justin Deschenaux

@jdeschena

2 months

@DrYangSong 🎯 SUMMARY. • Duo bridges continuous and discrete diffusion.• Enables faster training with curriculum learning.• Enables faster sampling by adapting consistency models to discrete spaces.8/9.

1

0

1

Justin Deschenaux

@jdeschena

2 months

@DrYangSong Using argmax at the final step cuts this to 8 steps, with only a slight drop in unigram entropy 🤯 7/9.

1

0

1

Justin Deschenaux

@jdeschena

2 months

@DrYangSong Discrete Consistency Distillation slashes sampling steps by orders of magnitude: after distillation, we can sample in just 16 steps and match the original generative perplexity—without lowering unigram entropy. 6/9

1

0

1

Justin Deschenaux

@jdeschena

2 months

Importantly, the Diffusion Duality lets us adapt Consistency Models (@DrYangSong) to discrete diffusion! 🔄✨ Even though there is no PF-ODE for discrete spaces, we can use the Gaussian PF-ODE to generate distillation trajectories, then transfer them to the discrete domain! 5/9

1

0

1

Justin Deschenaux

@jdeschena

2 months

After 1M training steps, Duo sets a new SoTA 🏅 among comparable uniform-state diffusion models, thanks to curriculum learning! 📈 Duo also outperforms AR models on 3 out of 7 zero-shot datasets 🚀 4/9

1

0

1

Justin Deschenaux

@jdeschena

2 months

The curriculum learning strategy slashes training steps by 2x by reducing variance: early during training, instead of discrete values, we feed the neural network a low-temperature softmax over the Gaussian diffusion vector, which accelerates training 🚀 3/9

1

0

1

Justin Deschenaux

@jdeschena

2 months

🔍 THE CONNECTION:. When you apply argmax to vectors computed through Gaussian diffusion, you get uniform-state discrete diffusion. It is quite amazing that such a simple operation connects two seemingly disconnected paradigms 🤯 2/9

1

0

1