haoxuan_steve_c Profile Banner
Haoxuan (Steve) Chen Profile
Haoxuan (Steve) Chen

@haoxuan_steve_c

Followers
1K
Following
7K
Media
12
Statuses
912

Ph.D. @Stanford; B.S. @Caltech; ML Intern @AmazonScience @NECLabsAmerica; Applied and Computational Mathematics/Machine Learning/Statistics/Scientific Computing

Stanford, CA
Joined July 2019
Don't wanna be here? Send us removal request.
@haoxuan_steve_c
Haoxuan (Steve) Chen
5 months
šŸŽ‰ Thrilled to share our latest work on solving inverse problems via diffusion-based priors — without heuristic approximations of the measurement matching score! šŸ“„ Link: https://t.co/FAXh6HIt2F (1/6) #DiffusionModels #InverseProblems #Guidance #SequentialMonteCarlo
4
24
161
@jasondeanlee
Jason Lee
9 days
I always get frustrated when asked what is ML theory good for and people ask for specific examples. I find this question unfair, I think its really just having a theory/mathematical perspective is sometimes super helpful. E.g. Diffusion models and its relatives, I don't see how
@QuanquanGu
Quanquan Gu
10 days
No joke. Most people haven’t yet realized how powerful machine learning theory actually is. I’m speaking from the perspective of someone directly building AGI: it stabilizes both pretraining and RL, and it provides the blueprint for scaling all the way to AGI.
12
11
339
@pushmeet
Pushmeet Kohli
10 days
(1) Introducing the AI for Math Initiative! Supported by @GoogleDeepMind and @GoogleOrg, five leading global institutions (@imperialcollege, @the_IAS, @Institut_IHES, @SimonsInstitute and @TIFRScience) are coming together to pioneer the use of AI in mathematical research.
7
45
396
@du_yilun
Yilun Du
11 days
Sharing our work at @NeurIPSConf on reasoning with EBMs! We learn an EBM over simple subproblems and combine EBMs at test-time to solve complex reasoning problems (3-SAT, graph coloring, crosswords). Generalizes well to complex 3-SAT / graph coloring/ N-queens problems.
6
36
325
@JCJesseLai
Chieh-Hsin (Jesse) Lai
11 days
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! šŸ“˜ We’re excited to release 怊The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
43
429
2K
@huijiezh
Huijie Zhang
12 days
Few-step diffusion model field is wild, and there are many methods trying to train a high-quality few-step generator from scratch: Consistency Models, Shortcut Models, and MeanFlow. Turns out, they could be unified in a quite elegant way, which we did in our recent work.
3
50
395
@DrYangSong
Yang Song
10 days
Applications change, but the principles are enduring. After a year's hard work led by @JCJesseLai, we are really excited to share this deep, systematic dive into the mathematical principles of diffusion models. This is a monograph we always wished we had.
@JCJesseLai
Chieh-Hsin (Jesse) Lai
11 days
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! šŸ“˜ We’re excited to release 怊The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
7
40
436
@ErnestRyu
Ernest Ryu
11 days
Preprint on using ChatGPT to resolve a 42-year-old open problem (point convergence of Nesterov’s accelerated gradient method) is out. Mathematical results are complete, though still need to expand the discussion of historical context & prior work. (1/2) https://t.co/Dmd9huMjXS
Tweet card summary image
arxiv.org
The Nesterov accelerated gradient method, introduced in 1983, has been a cornerstone of optimization theory and practice. Yet the question of its point convergence had remained open. In this work,...
12
68
470
@StefanoErmon
Stefano Ermon
11 days
Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!
@JCJesseLai
Chieh-Hsin (Jesse) Lai
11 days
Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! šŸ“˜ We’re excited to release 怊The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core
13
132
1K
@dianarycai
Diana Cai
12 days
Fisher meets Feynman! šŸ¤ We use score matching and a trick from quantum field theory to make a product-of-experts family both expressive and efficient for variational inference. To appear as a spotlight @ NeurIPS 2025. #NeurIPS2025 (link below)
4
46
410
@thinkymachines
Thinking Machines
12 days
Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other
62
393
3K
@Fanghui_SgrA
Fanghui Liu
12 days
What is "good" reasoning and how to evalute it? šŸš€We explore a new pipeline to model step-level reasoning, a ā€œGoldilocks principleā€ that balances free-form CoT and LEAN! Led by my student @yuanhezhang6, in colloboration with Ilja from DeepMind, @jasondeanlee, @CL_Theory
3
12
81
@ssahoo_
Subham Sahoo
12 days
šŸ”„ Rethinking Reasoning (with Diffusion LLMs) This work changes how you think about reasoning in LLMs. 🤯 Turns out: you don’t need the full chain-of-thought — only a small subset of CoT tokens actually matter for the final answer. āŒ Autoregressive LLMs can’t exploit this
10
36
230
@johnschulman2
John Schulman
13 days
Happy to share a new paper! Designing model behavior is hard -- desirable values often pull in opposite directions. Jifan's approach systematically generates scenarios where values conflict, helping us see where specs are missing coverage and how different models balance
@jifan_zhang
Jifan Zhang
15 days
New research paper with Anthropic and Thinking Machines AI companies use model specifications to define desirable behaviors during training. Are model specs clearly expressing what we want models to do? And do different frontier models have different personalities? We generated
13
47
616
@damekdavis
Damek
13 days
Until this morning, I've never spent time on this problem because iterate convergence is less important to me compared to convergence rates. What's new/difficult? First, Lemma 1 and the Lyapunov function are not new and are key to prior work on this problem. The key difficulty
@ErnestRyu
Ernest Ryu
18 days
The proof, cleaned up and typed up by me: (3/N)
2
13
155
@marikgoldstein
Mark Goldstein
14 days
I love how bayes/VI research ā€œis backā€, these days, just with huge models as priors e.g., KL-Regularized Reinforcement Learning is Designed to Mode Collapse https://t.co/UntSaALjlw
Tweet card summary image
arxiv.org
It is commonly believed that optimizing the reverse KL divergence results in "mode seeking", while optimizing forward KL results in "mass covering", with the latter being preferred if the goal is...
10
22
243
@ErnestRyu
Ernest Ryu
15 days
I used ChatGPT to solve an open problem in convex optimization. *Part II* 1/N https://t.co/6HDAr6y8Z9
@ErnestRyu
Ernest Ryu
18 days
I used ChatGPT to solve an open problem in convex optimization. *Part I* (1/N)
3
21
211
@canondetortugas
Dylan Foster 🐢
16 days
The coverage principle: How pre-training enables post-training New preprint where we look at the mechanisms through which next-token prediction produces models that succeed at downstream tasks. The answer involves a metric we call the "coverage profile", not cross-entropy.
7
37
285
@aviral_kumar2
Aviral Kumar
16 days
If you want to try train Q-functions via flow-matching, we just released code and runs: Code: https://t.co/ONSIzeRcAP Wandbs: https://t.co/bIhZLBvMI2 Also great to see so many other groups also training value functions via flow-matching!
Tweet card summary image
docs.google.com
@aviral_kumar2
Aviral Kumar
2 months
🚨🚨New paper on core RL: a way to train value-functions via flow-matching for scaling compute! No text/images, but a flow directly on a scalar Q-value. This unlocks benefits of iterative compute, test-time scaling for value prediction & SOTA results on whatever we tried. šŸ§µā¬‡ļø
3
37
259
@nasqret
Bartosz Naskręcki
16 days
We often talk about big leaps in AI for mathematics, but I think the small steps are equally impressive. The future of mathematics is now. I was working on a particular task: finding a case-free proof of the representability of the local NƩron function correction using a
22
76
549
@HanshengCh
Hansheng Chen
23 days
Excited to announce a new track of accelerating Generative AI: pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation https://t.co/6ro55E1XGP Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality.
3
28
152