
Sander Dieleman @ ICML 2025
@sedielem
Followers
61K
Following
12K
Media
99
Statuses
2K
Research Scientist at Google DeepMind (WaveNet, Imagen, Veo). I tweet about deep learning (research + software), music, generative models (personal account).
London, England
Joined December 2014
RT @_albertgu: Tokenization is just a special case of "chunking" - building low-level data into high-level abstractions - which is in turn….
0
174
0
Excellent blog post by @_albertgu about Transformers, SSMs and the role of tokenisation. Well worth a read.
I converted one of my favorite talks I've given over the past year into a blog post. "On the Tradeoffs of SSMs and Transformers".(or: tokens are bullshit). In a few days, we'll release what I believe is the next major advance for architectures.
3
10
96
RT @hermanhwdong: 🔥Happy to announce that the AI for Music Workshop is coming to #NeurIPS2025!. We have an amazing lineup of speakers! We c….
0
42
0
Roll call: #ICML2025 diffusion circle 📢 Who's coming? Please tag people that might be interested!. Date/time TBD, probably Thursday afternoon. (Beware though👇 joining a diffusion circle is at your own risk!🫣)
20
9
90
This looks like a great deep dive on neural network architectures for diffusion models. tl;dr use a Transformer, but there's quite a bit more to it, and as always in this field, the devil is in the details!.
Had the honor to present diffusion transformers at CS25, Stanford. The place is truly magical. Slides: Recording: Thanks to @stevenyfeng for making it happen!.
1
10
160
RT @chrisdonahuey: Excited to announce 🎵Magenta RealTime, the first open weights music generation model capable of real-time audio generati….
0
84
0
This work uncovers a profound connection between continuous and discrete (non-absorbing) diffusion models, allowing transfer of advanced techniques such as consistency distillation to the discrete setting!. Also: amazing title, no notes! 🧑🍳😙🤌.
🚨 “The Diffusion Duality” is out! @ICML2025 . ⚡️ Few-step generation in discrete diffusion language models by exploiting the underlying Gaussian diffusion. 🦾Beats AR on 3/7 zero-shot likelihood benchmarks. 📄 Paper: 💻 Code: 🧠
5
31
259
RT @DrMachakil: Yesterday, I played with Google Veo 3, and honestly… the possibilities blew my mind. Here’s a short mockumentary I made wit….
0
97
0
If you've read my latest blog post on generative modelling in latent space, this one is a great follow-up about putting things into practice.
In this blog post we will summarize some of our findings with training autoencoders for diffusion! We also share some null results we had with a slightly unconventional approach we tried. 1/2.
1
27
210
Here's something you don't see every day: the authors tried it at a larger scale and updated the preprint to confirm that unfortunately, the answer is no: it doesn't hold up at scale. The intellectual honesty is refreshing!
sorry for the late update. I bring disappointing news. softpick does NOT scale to larger models. overall training loss and benchmark results are worse than softmax on our 1.8B parameter models. we have updated the preprint on arxiv:
0
1
102
I just had to prove my UK immigration status at the airport for the first time (heading home from SF), using the digital-only system. The people at the check-in desk and I were taken aback by how unintuitive and cumbersome this process is. Physical proof is sorely needed!.
Thank you @PeteWishart for launching the Early Day Motion urging Government to launch an independent review of the digital-only immigration status & eVisas system & consider much needed alternatives! We appreciate you listening to the voices of EU citizens & all other migrants
3
2
27
RT @TacoCohen: Nobody wants to hear it, but working on data is more impactful than working on methods or architectures.
0
99
0