Yutong (Kelly) He
@electronickale
Followers
2K
Following
1K
Media
23
Statuses
173
PhD student @mldcmu, I’m so delusional that doing generative modeling is my job
Pittsburgh, PA
Joined March 2021
Diffusion/Flow-based models can sample in 1-2 steps now 👍 But likelihood? Still requires 100-1000 NFEs (even for these fast models) 😭 We fix this! Introducing F2D2: simultaneous fast sampling AND fast likelihood via joint flow map distillation. https://t.co/FFfqWnLIwu 1/🧵
9
72
420
🤖🤖Very excited to finally share our new work “Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control” Everyone in robotics does action-chunking, but why does it actually work?🤔🤔And, what can theory tell us
5
63
387
quite belated, but we finally uploaded "ARC-AGI Without Pretraining" to arXiv (link in reply) very impressive project by @LiaoIsaac91893 when he was just a first year PhD! he drove this entire project from beginning to end while I ate 🍿 at Neurips last week, Isaac was
ARC Prize 2025 Winners Interviews Paper Award 3rd Place @LiaoIsaac91893 shares the story behind CompressARC - an MDL-based, single puzzle-trained neural code golf system that achieves ~20–34% on ARC-AGI-1 and ~4% on ARC-AGI-2 without any pretraining or external data.
6
14
198
This was a fun project with @KeelyAi04 (amazing undergrad applying to grad schools) @_albertgu @rsalakhu @zicokolter @nmboffi @max_simchowitz. We hope this unlocks new possibilities for flow-based models! Paper: https://t.co/FFfqWnLIwu Code:
github.com
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models - Keely-Ai/F2D2
0
0
20
And we don't sacrifice sample quality! F2D2 variants maintain competitive FID while adding accurate likelihood. Even better: fast likelihood unlocks new tricks. With maximum likelihood self-guidance, we enable a 2-step MeanFlow to outperform 1024 step flow matching in FID 🤯
1
0
13
We tested F2D2 on CIFAR-10, ImageNet 64×64, and 2D synthetic data. Without F2D2, previous models fail to obtain valid likelihood estimations (negative BPD 💀) with few steps. With F2D2, we get calibrated likelihood close to flow matching with 100-1000 steps, using only 1-8 steps.
1
0
12
Best part? It's plug-and-play with any existing flow map model (Shortcut, MeanFlow, etc) Just add a divergence head to the existing model. That's it. Shared backbone, one head for sampling, one head for likelihood Train from scratch or finetune from pre-trained models, your call!
1
0
15
So we built F2D2 using flow maps, which skip slow ODE integration by learning to predict endpoints directly. F2D2 extends vanilla flow maps to the coupled system above: one model, jointly (self-)distilled to predict both sampling trajectory and cumulative divergence in parallel.
1
0
14
Turns out the solution was right in front of us: When computing likelihood in CNFs, you already get a coupled system of ODEs. Sampling and log-likelihood trajectory evolve together depending on the same velocity field. So why not just distill both together?
1
1
16
Why should you care about likelihood? If you're doing RL finetuning (PPO/GRPO) → you need it If you're hypothesis testing → you need it If you're doing cool applications like drug discovery → you really need it But right now for diffusion/flow it's 100x slower than sampling 🥲
2
0
20
Paper version + video interview for ARC-AGI Without Pretraining are now available! 📄Paper: https://t.co/XTOQHh4fzC 🎥Video interview:
1
6
34
To scale this class up, we need compute resources so that all students can train their models. I'll be at #NeurIPS 12/1-12/6, and if you're interested in sponsoring compute, I'd love to connect! Please DM me or grab me there 🙏🙏🙏! Course website:
2
2
27
Beyond the class itself, this is also an experiment in AI-native education and an attempt to solve the "holy grail" problem. I'm documenting the entire process and will share everything we learn publicly. I hope this can contribute a clearer roadmap for teaching in the age of AI
1
0
14
This is CMU's first course dedicated entirely to diffusion & flow matching, designed for 20 students but 139 signed up! We're scaling it to fit more people in-person and open-sourcing everything: slides, homework, and lecture recordings, so anyone in the world can learn with us!
1
0
18
In this class, students will build complete image generation systems from scratch via cumulative homework. They can choose their specialization (fidelity/speed/controllability) and tackle it with their own creativity. No exams and open everything: AI tools, open-source code, etc
1
1
9
I've always wanted to teach diffusion & flow matching, a math-heavy and often intimidating topic. LLM can do math now so traditional classes ❌ My take: "What I cannot create, I do not understand." Learning by building is robust even with AI. The key is what we ask them to build
2
0
8
This idea started at a group meeting where my advisor @zicokolter posed what he called the "holy grail" of education today: how do we ensure students are actually learning when AI can do everything for them? He and @rsalakhu encouraged me to find out by teaching a class @mldcmu
1
0
13
I'm teaching a diffusion & flow matching class at CMU in Spring 2026 where students can use ChatGPT, Cursor, or any AI tool they want. No exams. Just build with open internet. 139 students signed up for 20 spots. Here's what's happening: 🧵 https://t.co/t74V81OGiZ
25
58
380
Well, really didn’t expect this to age so well within a day but here we are
Doing ICLR and TMLR rebuttal at the same time is such a crazy experience. For ICLR I only got 2/7 reviewers to look at my rebuttal. For TMLR I got months-long discussions and my ac even went out their way to consult additional experts just to make sure my derivations are correct
2
0
15
Doing ICLR and TMLR rebuttal at the same time is such a crazy experience. For ICLR I only got 2/7 reviewers to look at my rebuttal. For TMLR I got months-long discussions and my ac even went out their way to consult additional experts just to make sure my derivations are correct
3
6
196