Théophane Vallaeys @webalorn X Profile

Théophane Vallaeys

@webalorn

Followers

105

Following

196

Media

14

Statuses

56

PhD student @MetaAI (FAIR Paris) and Sorbonne University | Graduated from ENS | Into generative image modeling

https://t.co/KlKX4vIGFp

Paris, France

Joined May 2023

Don't wanna be here? Send us removal request.

Théophane Vallaeys

@webalorn

2 months

🎆 Can we achieve high compression rate for images in autoencoders without compromising quality and decoding speed? ⚡️ We introduce SSDD (Single-Step Diffusion Decoder), achieving improvements on both fonts, setting new state-of-the-art on image reconstruction. 👇 1/N

5

34

169

AI at Meta

@AIatMeta

3 days

Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: https://t.co/tIwymSSD89 2️⃣ SAM 3D

107

589

4K

João Maria Janeiro

@JoaoMJaneiro

17 days

🚨New Paper @AIatMeta 🚨 You want to train a largely multilingual model, but languages keep interfering and you can’t boost performance? Using a dense model is suboptimal when mixing many languages, so what can you do? You can use our new architecture Mixture of Languages! 🧵1/n

3

11

22

Nicolas DUFOUR

@nico_dufour

22 days

Text-to-Image models don't need 3 training stages anymore! 🤯 Our new MIRO method integrates human alignment directly into pretraining. 19x faster convergence ⚡ 370x less compute than FLUX-dev 📉 Train once, align to many rewards. The era of multi-stage training is over!

1

14

31

Théophane Vallaeys

@webalorn

1 month

This work also showcases the use of diffusion decoders to reconstruct from semantic embeddings, as these decoders are able to "fill out" the details. This is an application of our SSDD model:

Théophane Vallaeys

@webalorn

2 months

🎆 Can we achieve high compression rate for images in autoencoders without compromising quality and decoding speed? ⚡️ We introduce SSDD (Single-Step Diffusion Decoder), achieving improvements on both fonts, setting new state-of-the-art on image reconstruction. 👇 1/N

0

1

Théophane Vallaeys

@webalorn

1 month

This new work we published, lead by Xiangyi Chen, shows how we can extract downsampled embeddings from semantic encoders. They outperform other types of encoders for both generation and understanding!

John Nguyen

@__JohnNguyen__

1 month

Why add REPA when you can be explicit and use the VLM representation to generate? 🤔 We found the semantic encoder already has the right priors. Train it to sample in its native latent space + lightweight pixel decoder = unified vision model. But naively using the semantic

1

2

6

John Nguyen

@__JohnNguyen__

2 months

Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow

7

83

412

Théophane Vallaeys

@webalorn

2 months

⚙️ Last but not least, we make our training code available, enabling use in any downstream applications. Arxiv: https://t.co/98VFhYn57r Code: https://t.co/HbgGaefBKe ✔️ N/N

github.com

Official implementation for SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization. - facebookresearch/SSDD

0

1

9

Théophane Vallaeys

@webalorn

2 months

This was the first main project of my PhD at @AIatMeta. Thanks a lot to my supervisors Jakob Verbeek and Matthieu Cord! 👇 13/N

1

0

7

Théophane Vallaeys

@webalorn

2 months

We highlight that diffusion decoders exhibit diversity in reconstructions, focused around meaningful details of the image. Single-step distillation doesn’t hinder diversity, keeping model behavior intact. 👇 12/N

1

0

6

Théophane Vallaeys

@webalorn

2 months

High spatial downsampling matters: reduce the number of tokens improves the latent generation speed. ↕️ Diffusion decoders generalize better at different spatial downsampling factors, keeping similar reconstruction quality when the total compression rate is constant. 👇 11/N

1

0

7

Théophane Vallaeys

@webalorn

2 months

🤔 We analyze the sampling dynamics: more steps ≠ higher quality. 🔎 This behavior comes from perceptual loss, increasing diversity until the model overshoots the distribution. 👉 We consider sampling as behavior selection, which we distill in a single-step model. 👇 10/N

1

0

6

Théophane Vallaeys

@webalorn

2 months

To enable switching between decoders seamlessly, we show that we can train a single shared encoder per compression rate using a simple data augmentation. 👉 This creates a single shared latent space, where a single latent diffusion model can be paired with any decoder. 👇 9/N

1

0

6

Théophane Vallaeys

@webalorn

2 months

📊 We scale our model family from S (⚡️faster than KL-VAE) to H (achieving higher gains on high-compression settings), enabling downstream applications to choose the trade-off between decoding speed, quality, and compression rate. 👇 8/N

1

0

5

Théophane Vallaeys

@webalorn

2 months

👉 Applying our decoder on image-generation, we show improvements in image quality across all settings. 🔥 Additionally, the modeling capabilities of SSDD can be used to achieve higher compression rates, keeping image quality while greatly reducing generation latency. 👇 7/N

1

0

6

Théophane Vallaeys

@webalorn

2 months

❌ SSDD training method is GAN-free: we show that, opposed to existing deterministic or diffusion decoders, adversarial loss does not bring any perceptible quality improvement. 👉 This enables easier scaling and even more stable training. 👇 6/N

1

0

6

Théophane Vallaeys

@webalorn

2 months

🔥 We show that using a Flow-Matching loss (for latent modeling), LPIPS (for perceptual alignment), and REPA (for internal model features alignment), SSDD reconstructs high-quality images. 👇 5/N

1

0

7

Théophane Vallaeys

@webalorn

2 months

🤔 Existing diffusion decoders are based on U-Net, lacking in modeling capabilities. 👉 To bridge the gap between pixel-space and latent modeling, we adapt the U-ViT (from Simpler Diffusion), achieving superior reconstruction performance at similar parameter count. 👇 4/N

2

0

9

Théophane Vallaeys

@webalorn

2 months

Main take-away: 👉 We train an iterative diffusion decoder for state-of-the-art image reconstruction. ✨ We can then distill it into a ⚡️fast single-step decoder, keeping reconstruction quality and diversity. 👇 3/N

1

0

7

Théophane Vallaeys

@webalorn

2 months

🔎 High compression rates are used to train fast latent diffusion models. 🤔 But the reconstruction decoder becomes the bottleneck by suppressing image details. 👉 We propose to alleviate this by explicitly modeling the distribution of missing information 👇 2/N

1

0

9