Haoxuan (Steve) Chen @haoxuan_steve_c X Profile

Haoxuan (Steve) Chen

@haoxuan_steve_c

Followers

1K

Following

7K

Media

12

Statuses

912

Ph.D. @Stanford; B.S. @Caltech; ML Intern @AmazonScience @NECLabsAmerica; Applied and Computational Mathematics/Machine Learning/Statistics/Scientific Computing

https://t.co/9M8XVgqRLI

Stanford, CA

Joined July 2019

Don't wanna be here? Send us removal request.

Haoxuan (Steve) Chen

@haoxuan_steve_c

5 months

🎉 Thrilled to share our latest work on solving inverse problems via diffusion-based priors — without heuristic approximations of the measurement matching score! 📄 Link: https://t.co/FAXh6HIt2F (1/6) #DiffusionModels #InverseProblems #Guidance #SequentialMonteCarlo

Lexing Ying

@lexing_ying

5 months

https://t.co/lOBgjrhze4

4

24

161

Jason Lee

@jasondeanlee

9 days

I always get frustrated when asked what is ML theory good for and people ask for specific examples. I find this question unfair, I think its really just having a theory/mathematical perspective is sometimes super helpful. E.g. Diffusion models and its relatives, I don't see how

Quanquan Gu

@QuanquanGu

10 days

No joke. Most people haven’t yet realized how powerful machine learning theory actually is. I’m speaking from the perspective of someone directly building AGI: it stabilizes both pretraining and RL, and it provides the blueprint for scaling all the way to AGI.

12

11

339

Pushmeet Kohli

@pushmeet

10 days

(1) Introducing the AI for Math Initiative! Supported by @GoogleDeepMind and @GoogleOrg, five leading global institutions (@imperialcollege, @the_IAS, @Institut_IHES, @SimonsInstitute and @TIFRScience) are coming together to pioneer the use of AI in mathematical research.

7

45

396

Yilun Du

@du_yilun

11 days

Sharing our work at @NeurIPSConf on reasoning with EBMs! We learn an EBM over simple subproblems and combine EBMs at test-time to solve complex reasoning problems (3-SAT, graph coloring, crosswords). Generalizes well to complex 3-SAT / graph coloring/ N-queens problems.

6

36

325

Chieh-Hsin (Jesse) Lai

@JCJesseLai

11 days

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core

43

429

2K

Huijie Zhang

@huijiezh

12 days

Few-step diffusion model field is wild, and there are many methods trying to train a high-quality few-step generator from scratch: Consistency Models, Shortcut Models, and MeanFlow. Turns out, they could be unified in a quite elegant way, which we did in our recent work.

3

50

395

Yang Song

@DrYangSong

10 days

Applications change, but the principles are enduring. After a year's hard work led by @JCJesseLai, we are really excited to share this deep, systematic dive into the mathematical principles of diffusion models. This is a monograph we always wished we had.

Chieh-Hsin (Jesse) Lai

@JCJesseLai

11 days

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core

7

40

436

Ernest Ryu

@ErnestRyu

11 days

Preprint on using ChatGPT to resolve a 42-year-old open problem (point convergence of Nesterov’s accelerated gradient method) is out. Mathematical results are complete, though still need to expand the discussion of historical context & prior work. (1/2) https://t.co/Dmd9huMjXS

arxiv.org

The Nesterov accelerated gradient method, introduced in 1983, has been a cornerstone of optimization theory and practice. Yet the question of its point convergence had remained open. In this work,...

12

68

470

Stefano Ermon

@StefanoErmon

11 days

Tired of chasing references across dozens of papers? This monograph distills it all: the principles, intuition, and math behind diffusion models. Thrilled to share!

Chieh-Hsin (Jesse) Lai

@JCJesseLai

11 days

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with @DrYangSong, @gimdong58085414, @mittu1204, and @StefanoErmon. It traces the core

13

132

1K

Diana Cai

@dianarycai

12 days

Fisher meets Feynman! 🤝 We use score matching and a trick from quantum field theory to make a product-of-experts family both expressive and efficient for variational inference. To appear as a spotlight @ NeurIPS 2025. #NeurIPS2025 (link below)

4

46

410

Thinking Machines

@thinkymachines

12 days

Our latest post explores on-policy distillation, a training approach that unites the error-correcting relevance of RL with the reward density of SFT. When training it for math reasoning and as an internal chat assistant, we find that on-policy distillation can outperform other

62

393

3K

Fanghui Liu

@Fanghui_SgrA

12 days

What is "good" reasoning and how to evalute it? 🚀We explore a new pipeline to model step-level reasoning, a “Goldilocks principle” that balances free-form CoT and LEAN! Led by my student @yuanhezhang6, in colloboration with Ilja from DeepMind, @jasondeanlee, @CL_Theory

3

12

81

Subham Sahoo

@ssahoo_

12 days

🔥 Rethinking Reasoning (with Diffusion LLMs) This work changes how you think about reasoning in LLMs. 🤯 Turns out: you don’t need the full chain-of-thought — only a small subset of CoT tokens actually matter for the final answer. ❌ Autoregressive LLMs can’t exploit this

10

36

230

John Schulman

@johnschulman2

13 days

Happy to share a new paper! Designing model behavior is hard -- desirable values often pull in opposite directions. Jifan's approach systematically generates scenarios where values conflict, helping us see where specs are missing coverage and how different models balance

Jifan Zhang

@jifan_zhang

15 days

New research paper with Anthropic and Thinking Machines AI companies use model specifications to define desirable behaviors during training. Are model specs clearly expressing what we want models to do? And do different frontier models have different personalities? We generated

13

47

616

Damek

@damekdavis

13 days

Until this morning, I've never spent time on this problem because iterate convergence is less important to me compared to convergence rates. What's new/difficult? First, Lemma 1 and the Lyapunov function are not new and are key to prior work on this problem. The key difficulty

Ernest Ryu

@ErnestRyu

18 days

The proof, cleaned up and typed up by me: (3/N)

2

13

155

Mark Goldstein

@marikgoldstein

14 days

I love how bayes/VI research “is back”, these days, just with huge models as priors e.g., KL-Regularized Reinforcement Learning is Designed to Mode Collapse https://t.co/UntSaALjlw

arxiv.org

It is commonly believed that optimizing the reverse KL divergence results in "mode seeking", while optimizing forward KL results in "mass covering", with the latter being preferred if the goal is...

10

22

243

Ernest Ryu

@ErnestRyu

15 days

I used ChatGPT to solve an open problem in convex optimization. *Part II* 1/N https://t.co/6HDAr6y8Z9

Ernest Ryu

@ErnestRyu

18 days

I used ChatGPT to solve an open problem in convex optimization. *Part I* (1/N)

3

21

211

Dylan Foster 🐢

@canondetortugas

16 days

The coverage principle: How pre-training enables post-training New preprint where we look at the mechanisms through which next-token prediction produces models that succeed at downstream tasks. The answer involves a metric we call the "coverage profile", not cross-entropy.

7

37

285

Aviral Kumar

@aviral_kumar2

16 days

If you want to try train Q-functions via flow-matching, we just released code and runs: Code: https://t.co/ONSIzeRcAP Wandbs: https://t.co/bIhZLBvMI2 Also great to see so many other groups also training value functions via flow-matching!

docs.google.com

Aviral Kumar

@aviral_kumar2

2 months

🚨🚨New paper on core RL: a way to train value-functions via flow-matching for scaling compute! No text/images, but a flow directly on a scalar Q-value. This unlocks benefits of iterative compute, test-time scaling for value prediction & SOTA results on whatever we tried. 🧵⬇️

3

37

259

Bartosz Naskręcki

@nasqret

16 days

We often talk about big leaps in AI for mathematics, but I think the small steps are equally impressive. The future of mathematics is now. I was working on a particular task: finding a case-free proof of the representability of the local Néron function correction using a

22

76

549

Hansheng Chen

@HanshengCh

23 days

Excited to announce a new track of accelerating Generative AI: pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation https://t.co/6ro55E1XGP Distill 20B flow models now using just an L2 loss via imitation learning for SOTA diversity and teacher-aligned quality.

3

28

152