webalorn Profile Banner
Théophane Vallaeys Profile
Théophane Vallaeys

@webalorn

Followers
105
Following
196
Media
14
Statuses
56

PhD student @MetaAI (FAIR Paris) and Sorbonne University | Graduated from ENS | Into generative image modeling

Paris, France
Joined May 2023
Don't wanna be here? Send us removal request.
@webalorn
Théophane Vallaeys
2 months
🎆 Can we achieve high compression rate for images in autoencoders without compromising quality and decoding speed? ⚡️ We introduce SSDD (Single-Step Diffusion Decoder), achieving improvements on both fonts, setting new state-of-the-art on image reconstruction. 👇 1/N
5
34
169
@AIatMeta
AI at Meta
3 days
Today we’re excited to unveil a new generation of Segment Anything Models: 1️⃣ SAM 3 enables detecting, segmenting and tracking of objects across images and videos, now with short text phrases and exemplar prompts. 🔗 Learn more about SAM 3: https://t.co/tIwymSSD89 2️⃣ SAM 3D
107
589
4K
@JoaoMJaneiro
João Maria Janeiro
17 days
🚨New Paper @AIatMeta 🚨 You want to train a largely multilingual model, but languages keep interfering and you can’t boost performance? Using a dense model is suboptimal when mixing many languages, so what can you do? You can use our new architecture Mixture of Languages! 🧵1/n
3
11
22
@nico_dufour
Nicolas DUFOUR
22 days
Text-to-Image models don't need 3 training stages anymore! 🤯 Our new MIRO method integrates human alignment directly into pretraining. 19x faster convergence ⚡ 370x less compute than FLUX-dev 📉 Train once, align to many rewards. The era of multi-stage training is over!
1
14
31
@webalorn
Théophane Vallaeys
1 month
This work also showcases the use of diffusion decoders to reconstruct from semantic embeddings, as these decoders are able to "fill out" the details. This is an application of our SSDD model:
@webalorn
Théophane Vallaeys
2 months
🎆 Can we achieve high compression rate for images in autoencoders without compromising quality and decoding speed? ⚡️ We introduce SSDD (Single-Step Diffusion Decoder), achieving improvements on both fonts, setting new state-of-the-art on image reconstruction. 👇 1/N
0
0
1
@webalorn
Théophane Vallaeys
1 month
This new work we published, lead by Xiangyi Chen, shows how we can extract downsampled embeddings from semantic encoders. They outperform other types of encoders for both generation and understanding!
@__JohnNguyen__
John Nguyen
1 month
Why add REPA when you can be explicit and use the VLM representation to generate? 🤔 We found the semantic encoder already has the right priors. Train it to sample in its native latent space + lightweight pixel decoder = unified vision model. But naively using the semantic
1
2
6
@__JohnNguyen__
John Nguyen
2 months
Transfusion combines autoregressive with diffusion to train a single transformer, but what if we combine Flow with Flow? 🤔 🌊OneFlow🌊 the first non-autoregressive model to generate text and images concurrently using a single transformer—unifying Edit Flow (text) with Flow
7
83
412
@webalorn
Théophane Vallaeys
2 months
⚙️ Last but not least, we make our training code available, enabling use in any downstream applications. Arxiv: https://t.co/98VFhYn57r Code: https://t.co/HbgGaefBKe ✔️ N/N
Tweet card summary image
github.com
Official implementation for SSDD Single-Step Diffusion Decoder for Efficient Image Tokenization. - facebookresearch/SSDD
0
1
9
@webalorn
Théophane Vallaeys
2 months
This was the first main project of my PhD at @AIatMeta. Thanks a lot to my supervisors Jakob Verbeek and Matthieu Cord! 👇 13/N
1
0
7
@webalorn
Théophane Vallaeys
2 months
We highlight that diffusion decoders exhibit diversity in reconstructions, focused around meaningful details of the image. Single-step distillation doesn’t hinder diversity, keeping model behavior intact. 👇 12/N
1
0
6
@webalorn
Théophane Vallaeys
2 months
High spatial downsampling matters: reduce the number of tokens improves the latent generation speed. ↕️ Diffusion decoders generalize better at different spatial downsampling factors, keeping similar reconstruction quality when the total compression rate is constant. 👇 11/N
1
0
7
@webalorn
Théophane Vallaeys
2 months
🤔 We analyze the sampling dynamics: more steps ≠ higher quality. 🔎 This behavior comes from perceptual loss, increasing diversity until the model overshoots the distribution. 👉 We consider sampling as behavior selection, which we distill in a single-step model. 👇 10/N
1
0
6
@webalorn
Théophane Vallaeys
2 months
To enable switching between decoders seamlessly, we show that we can train a single shared encoder per compression rate using a simple data augmentation. 👉 This creates a single shared latent space, where a single latent diffusion model can be paired with any decoder. 👇 9/N
1
0
6
@webalorn
Théophane Vallaeys
2 months
📊 We scale our model family from S (⚡️faster than KL-VAE) to H (achieving higher gains on high-compression settings), enabling downstream applications to choose the trade-off between decoding speed, quality, and compression rate. 👇 8/N
1
0
5
@webalorn
Théophane Vallaeys
2 months
👉 Applying our decoder on image-generation, we show improvements in image quality across all settings. 🔥 Additionally, the modeling capabilities of SSDD can be used to achieve higher compression rates, keeping image quality while greatly reducing generation latency. 👇 7/N
1
0
6
@webalorn
Théophane Vallaeys
2 months
❌ SSDD training method is GAN-free: we show that, opposed to existing deterministic or diffusion decoders, adversarial loss does not bring any perceptible quality improvement. 👉 This enables easier scaling and even more stable training. 👇 6/N
1
0
6
@webalorn
Théophane Vallaeys
2 months
🔥 We show that using a Flow-Matching loss (for latent modeling), LPIPS (for perceptual alignment), and REPA (for internal model features alignment), SSDD reconstructs high-quality images. 👇 5/N
1
0
7
@webalorn
Théophane Vallaeys
2 months
🤔 Existing diffusion decoders are based on U-Net, lacking in modeling capabilities. 👉 To bridge the gap between pixel-space and latent modeling, we adapt the U-ViT (from Simpler Diffusion), achieving superior reconstruction performance at similar parameter count. 👇 4/N
2
0
9
@webalorn
Théophane Vallaeys
2 months
Main take-away: 👉 We train an iterative diffusion decoder for state-of-the-art image reconstruction. ✨ We can then distill it into a ⚡️fast single-step decoder, keeping reconstruction quality and diversity. 👇 3/N
1
0
7
@webalorn
Théophane Vallaeys
2 months
🔎 High compression rates are used to train fast latent diffusion models. 🤔 But the reconstruction decoder becomes the bottleneck by suppressing image details. 👉 We propose to alleviate this by explicitly modeling the distribution of missing information 👇 2/N
1
0
9