Haoxuan (Steve) Chen
@haoxuan_steve_c
Followers
1K
Following
7K
Media
12
Statuses
912
Ph.D. @Stanford; B.S. @Caltech; ML Intern @AmazonScience @NECLabsAmerica; Applied and Computational Mathematics/Machine Learning/Statistics/Scientific Computing
Stanford, CA
Joined July 2019
š Thrilled to share our latest work on solving inverse problems via diffusion-based priors ā without heuristic approximations of the measurement matching score! š Link: https://t.co/FAXh6HIt2F (1/6) #DiffusionModels #InverseProblems #Guidance #SequentialMonteCarlo
4
24
161
I always get frustrated when asked what is ML theory good for and people ask for specific examples. I find this question unfair, I think its really just having a theory/mathematical perspective is sometimes super helpful. E.g. Diffusion models and its relatives, I don't see how
No joke. Most people havenāt yet realized how powerful machine learning theory actually is. Iām speaking from the perspective of someone directly building AGI: it stabilizes both pretraining and RL, and it provides the blueprint for scaling all the way to AGI.
12
11
339
(1) Introducing the AI for Math Initiative! Supported by @GoogleDeepMind and @GoogleOrg, five leading global institutions (@imperialcollege, @the_IAS, @Institut_IHES, @SimonsInstitute and @TIFRScience) are coming together to pioneer the use of AI in mathematical research.
7
45
396
Sharing our work at @NeurIPSConf on reasoning with EBMs! We learn an EBM over simple subproblems and combine EBMs at test-time to solve complex reasoning problems (3-SAT, graph coloring, crosswords). Generalizes well to complex 3-SAT / graph coloring/ N-queens problems.
6
36
325
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! š Weāre excited to release ćThe Principles of Diffusion Modelsćā with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
43
429
2K
Few-step diffusion model field is wild, and there are many methods trying to train a high-quality few-step generator from scratch: Consistency Models, Shortcut Models, and MeanFlow. Turns out, they could be unified in a quite elegant way, which we did in our recent work.
3
50
395
Applications change, but the principles are enduring. After a year's hard work led by @JCJesseLai, we are really excited to share this deep, systematic dive into the mathematical principles of diffusion models. This is a monograph we always wished we had.
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! š Weāre excited to release ćThe Principles of Diffusion Modelsćā with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
7
40
436
Preprint on using ChatGPT to resolve a 42-year-old open problem (point convergence of Nesterovās accelerated gradient method) is out. Mathematical results are complete, though still need to expand the discussion of historical context & prior work. (1/2) https://t.co/Dmd9huMjXS
arxiv.org
The Nesterov accelerated gradient method, introduced in 1983, has been a cornerstone of optimization theory and practice. Yet the question of its point convergence had remained open. In this work,...
12
68
470
Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! š Weāre excited to release ćThe Principles of Diffusion Modelsćā with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
13
132
1K
Fisher meets Feynman! š¤ We use score matching and a trick from quantum field theory to make a product-of-experts family both expressive and efficient for variational inference. To appear as a spotlight @ NeurIPS 2025. #NeurIPS2025 (link below)
4
46
410
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
62
393
3K
What is "good" reasoning and how to evalute it? šWe explore a new pipeline to model step-level reasoning, a āGoldilocks principleā that balances free-form CoT and LEAN! Led by my student @yuanhezhang6, in colloboration with Ilja from DeepMind, @jasondeanlee, @CL_Theory
3
12
81
š„ Rethinking Reasoning (with Diffusion LLMs) This work changes how you think about reasoning in LLMs. 𤯠Turns out: you donāt need the full chain-of-thought ā only a small subset of CoT tokens actually matter for the final answer. ā Autoregressive LLMs canāt exploit this
10
36
230
Happy to share a new paper! Designing model behavior is hard -- desirable values often pull in opposite directions. Jifan's approach systematically generates scenarios where values conflict, helping us see where specs are missing coverage and how different models balance
New research paper with Anthropic and Thinking Machines AI companies use model specifications to define desirable behaviors during training. Are model specs clearly expressing what we want models to do? And do different frontier models have different personalities? We generated
13
47
616
Until this morning, I've never spent time on this problem because iterate convergence is less important to me compared to convergence rates. What's new/difficult? First, Lemma 1 and the Lyapunov function are not new and are key to prior work on this problem. The key difficulty
2
13
155
I love how bayes/VI research āis backā, these days, just with huge models as priors e.g., KL-Regularized Reinforcement Learning is Designed to Mode Collapse https://t.co/UntSaALjlw
arxiv.org
It is commonly believed that optimizing the reverse KL divergence results in "mode seeking", while optimizing forward KL results in "mass covering", with the latter being preferred if the goal is...
10
22
243
I used ChatGPT to solve an open problem in convex optimization. *Part II* 1/N https://t.co/6HDAr6y8Z9
3
21
211
The coverage principle: How pre-training enables post-training New preprint where we look at the mechanisms through which next-token prediction produces models that succeed at downstream tasks. The answer involves a metric we call the "coverage profile", not cross-entropy.
7
37
285
If you want to try train Q-functions via flow-matching, we just released code and runs: Code: https://t.co/ONSIzeRcAP Wandbs: https://t.co/bIhZLBvMI2 Also great to see so many other groups also training value functions via flow-matching!
docs.google.com
šØšØNew paper on core RL: a way to train value-functions via flow-matching for scaling compute! No text/images, but a flow directly on a scalar Q-value. This unlocks benefits of iterative compute, test-time scaling for value prediction & SOTA results on whatever we tried. š§µā¬ļø
3
37
259
We often talk about big leaps in AI for mathematics, but I think the small steps are equally impressive. The future of mathematics is now. I was working on a particular task: finding a case-free proof of the representability of the local NƩron function correction using a
22
76
549
Excited to announce a new track of accelerating Generative AI: pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation https://t.co/6ro55E1XGP Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality.
3
28
152