Sander Dieleman
@sedielem
Followers
64K
Following
12K
Media
106
Statuses
2K
Research Scientist at Google DeepMind (WaveNet, Imagen, Veo). I tweet about deep learning (research + software), music, generative models (personal account).
London, England
Joined December 2014
New blog post: let's talk about latents! https://t.co/Ddh7tXH642
sander.ai
Latent representations for generative models.
31
201
1K
The rehabilitation of continuous diffusion for discrete data continues! Check out CANDI by @PatrickPyn35903 @thjashin @ruqi_zhang Their insightful analysis explains why continuous methods have fallen behind, and why self-conditioning is so important. https://t.co/Bqn8Zd7hRz
In diffusion LMs, discrete methods have all but displaced continuous ones (🥲). Interesting new trend: why not both? Use continuous methods to make discrete diffusion better. Diffusion duality: https://t.co/KPO56vDygp CADD: https://t.co/CNOIWcUIMo CCDD:
1
11
60
Some prefer math and rigour, personally I like intuitive explanations. This monograph has plenty of both! I love how much time is spent linking different perspectives (variational, score-based, flow-based) together. Chapter 6 in particular is really great. Amazing effort! 👏
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
4
20
276
The scariest stories this October aren’t fiction—they’re funded. Read the new Capital Research magazine issue on our website!
4
4
72
Veo is getting a major upgrade. 🚀 We’re rolling out Veo 3.1, our updated video generation model, alongside improved creative controls for filmmakers, storytellers, and developers - many of them with audio. 🧵
123
427
2K
We asked the same question: how can we combine the strengths of continuous and discrete approaches? Similar to CDCD, in our work, Purrception, we extend Variational FM to model VQ latents through continuous-discrete transport for image generation :D 👉 https://t.co/KIog9mLNWb
In diffusion LMs, discrete methods have all but displaced continuous ones (🥲). Interesting new trend: why not both? Use continuous methods to make discrete diffusion better. Diffusion duality: https://t.co/KPO56vDygp CADD: https://t.co/CNOIWcUIMo CCDD:
1
12
70
In my blog post on latents for generative modelling, I pointed out that representation learning and reconstruction are two separate tasks (§6.3), which autoencoders try to solve simultaneously. Separating them makes sense. It opens up a lot of possibilities, as this work shows!
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
9
23
350
In diffusion LMs, discrete methods have all but displaced continuous ones (🥲). Interesting new trend: why not both? Use continuous methods to make discrete diffusion better. Diffusion duality: https://t.co/KPO56vDygp CADD: https://t.co/CNOIWcUIMo CCDD:
arxiv.org
Diffusion language models, especially masked discrete diffusion models, have achieved great success recently. While there are some theoretical and primary empirical results showing the advantages...
New survey on diffusion language models: https://t.co/SHicf69gxV (via @NicolasPerezNi1). Covers pre/post-training, inference and multimodality, with very nice illustrations. I can't help but feel a bit wistful about the apparent extinction of the continuous approach after 2023🥲
9
74
426
🔥Veo 3 has emergent zero-shot learning and reasoning capabilities! This multitalented model can do a huge range of interesting tasks. It understands physical properties, can manipulate objects, and can even reason. Check out more examples in this thread!
Veo is a more general reasoner than you might think. Check out this super cool paper on "Video models are zero-shot learners and reasoners" from my colleagues at @GoogleDeepMind.
4
23
166
Hydrate. Hustle. GO! CELSIUS HYDRATION - The ultimate hydration for every move. CELSIUS. LIVE. FIT. GO!
202
381
5K
5 billion nano 🍌 = 5 regular sized 🍌! Also TIL: A group of bananas is called a hand.
🍌 @GeminiApp just passed 5 billion images in less than a month. What a ride, still going! Latest trend: retro selfies of you holding a baby version of you. Can't make this stuff up!
0
3
10
The effective context length of Transformers with local (sliding window) attention layers is usually much shorter than the theoretical maximum. This blog post explains why. Back in 2017 the visualisations in https://t.co/JPLa3pyaON really changed my perspective on this for CNNs!
arxiv.org
We study characteristics of receptive fields of units in deep convolutional networks. The receptive field size is a crucial issue in many visual tasks, as the output must respond to large enough...
It's a common belief that L SWA layers (size W) yield an L×W receptive field. My post shows why the effective range is limited to O(W), regardless of depth. The reasons are information dilution and the exponential barrier from residual connections:
2
34
225
Really great deep dive on sources of nondeterminism in LLM inference. Before reading, I also believed atomicAdd was to blame for all of it, but it seems like that's mostly a red herring nowadays!
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
0
1
35
We’re thrilled to welcome Sander Dieleman, Research Scientist at Google DeepMind, to ML in PL Conference 2025! Sander Dieleman is a Research Scientist at Google DeepMind in London, UK, where he has worked on the development of AlphaGo, WaveNet, Imagen 4, Veo 3, and more. He
1
4
32
Our team at GDM is hiring! Consider applying if you’re excited to work on state-of-the-art media generation! https://t.co/zzPJfae49Q
job-boards.greenhouse.io
2
11
138
Does a smaller latent space lead to worse generation in latent diffusion models? Not necessarily! We show that LDMs are extremely robust to a wide range of compression rates (10-1000x) in the context of physics emulation. We got lost in latent space. Join us 👇
14
88
461
You know the team has built a good model when ... people go to a website that randomly serves it some of the time 5 million times
🚨🍌Breaking News: Gemini-2.5-Flash-Image-Preview (“nano-banana”) by @GoogleDeepMind now ranks #1 in Image Edit Arena. In just two weeks: 🟡“nano-banana” has driven over 5 million community votes in the Arena 🟡Record-breaking 2.5M+ votes casted for this model alone 🟡It has
9
14
204
🚨🍌Breaking News: Gemini-2.5-Flash-Image-Preview (“nano-banana”) by @GoogleDeepMind now ranks #1 in Image Edit Arena. In just two weeks: 🟡“nano-banana” has driven over 5 million community votes in the Arena 🟡Record-breaking 2.5M+ votes casted for this model alone 🟡It has
Image generation with Gemini just got a bananas upgrade and is the new state-of-the-art image generation and editing model. 🤯 From photorealistic masterpieces to mind-bending fantasy worlds, you can now natively produce, edit and refine visuals with new levels of reasoning,
36
158
1K
strange object spotted under the microscope over the weekend in the lab...
358
242
4K