Nate Gillman
@GillmanLab
Followers
805
Following
295
Media
26
Statuses
131
ML researcher, interning @Google, PhD-ing @BrownUniversity. I train deep generative models
Joined August 2021
Ever wish you could turn your video generator into a controllable physics simulator? We're thrilled to introduce Force Prompting! Animate any image with physical forces and get fine-grained control, without needing any physics simulator or 3D assets at inference. 🧵(1/n)
8
70
317
Excited to share our #NeurIPS2025 paper: PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation. We propose a novel framework to improve the controllability and physics plausibility of video models. Project Page: https://t.co/MENLbVHsjo (1/n)
5
33
181
A fun theorem (critical for why much of machine learning works!): higher-dimensional surfaces have relatively more saddle points that local minima, so "roll the ball downhill" gradient descent works better with *bigger* models. Surprising if you haven't thought about this! 1/3
16
43
639
We present MotionStream — real-time, long-duration video generation that you can interactively control just by dragging your mouse. All videos here are raw, real-time screen captures without any post-processing. Model runs on a single H100 at 29 FPS and 0.4s latency.
36
150
1K
Adaptable Intelligence. Multiple possible paths to an objective.
196
1K
16K
Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
13
136
1K
🚀 Thrilled to introduce Seed3D 1.0, a foundation model that generates High-Fidelity, Simulation-Ready 3D Assets directly from a Single Image! ✨ Key Capabilities: 1️⃣ High-fidelity Assets: Generates assets with accurate geometry, well-aligned textures, and physically-based
30
138
993
Rollouts in the real world are slow and expensive. What if we could rollout trajectories entirely inside a world model (WM)? Introducing 🚀Ctrl-World🚀, a generative manipulation WM that can interact with advanced VLA policy in imagination. 🧵1/6
5
39
210
🤯 Think better visuals mean better world models? Think again. 💥 Surprise: Agents don’t need eye candy— they need wins. Meet World-in-World, the first open benchmark that ranks world models by closed-loop task success, not pixels. We uncover 3 shocks: 1️⃣ Visuals ≠ utility 2️⃣
2
40
144
We are excited to present our work “How Can Objects Help Video-Language Understanding” at ICCV 2025 in Hawaii! We boost MLLMs’ spatiotemporal understanding with object-centric computer vision models. Come and visit our poster to chat about multimodal understanding! 🕚 Time:
0
2
4
today we're open-sourcing Krea Realtime. this 14B autoregressive model is 10x larger than any open-source equivalent, and it can generate long-form videos at 11 fps on a single B200. weights and technical report below 👇
61
203
1K
VERY nice paper from @SinaAlmd et al!! It's been well-established that moving weights in the direction where synthetic data wants them to go is bad for the model... so it's only natural that moving the weights in the opposite direction consistently helps the model. Excellent!!
Synthetic data promised to shatter data scarcity barriers, but self-generated samples trigger catastrophic model collapse. We discovered the key is thinking in reverse: degradation from self-training isn't random noise—it's a powerful signal provably anti-aligned with the
0
0
4
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
56
334
2K
Excited to share Equilibrium Matching (EqM)! EqM simplifies and outperforms flow matching, enabling strong generative performance of FID 1.96 on ImageNet 256x256. EqM learns a single static EBM landscape for generation, enabling a simple gradient-based generation procedure.
20
173
1K
Joint embeddings (JEPAs) and density estimation/generative models seem to be like oil and water. Yet, we prove how a good JEPA is also a good density estimator! And JEPAs achieve that without input space reconstruction, get p(x) from any pretrained model! https://t.co/VX94vKHlYK
9
49
353
We introduce a new ''rule'' for understanding diffusion models: Selective Underfitting. It explains: 🚨 How diffusion models generalize beyond training data 🚨 Why popular training recipes (e.g., DiT, REPA) are effective and scale well Co-led with @kiwhansong0! (1/n)
8
65
420
📢 Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation Got only one or a few images and wondering if recovering the 3D environment is a reconstruction or generation problem? Why not do it with a generative reconstruction model! We show that a
19
70
250
How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4
22
98
587
Some random thoughts I've been having about video world model/long video generation since working on Mixture of Contexts (whose title could also be "Learnable Sparse Attention for Long Video Generation"): 🚨Semi-long Post Alert🚨 1. Learnable sparse attention is still underrated
How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4
6
38
224
Accepted by #NeurIPS2025 as a spotlight!
Real-time video generation is finally real — without sacrificing quality. Introducing Self-Forcing, a new paradigm for training autoregressive diffusion models. The key to high quality? Simulate the inference process during training by unrolling transformers with KV caching.
4
14
183