Jerry Liu
@jerrywliu
Followers
392
Following
223
Media
14
Statuses
76
ML & numerics | ICME PhD at Stanford, @doecsgf fellow | prev @duolingo @berkeleylab @livermore_lab
Stanford, CA
Joined May 2022
Manifold Muon stabilizes large model training, but it's expensive 💰 -- requiring an inner loop solve for each update. 💡 But you can significantly accelerate it, leading to 2.3x speedup on @thinkymachines's experiment with no performance loss! Blog and 🧵below…
5
24
268
We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -
1K
1K
8K
Sliding window attention (SWA) is powering frontier hybrid models for efficiency. Is there something better? Introducing Phalanx, a faster and better quality drop-in replacement for sliding window attention (SWA). Phalanx is a new family of hardware and numerics-aware windowed
12
50
203
Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to
104
257
1K
@StanfordHAI just ran this story on self-study and cartridges -- it's a really nice overview for those curious about our work
1
19
45
(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.
7
50
326
(1/6) We’re happy to share that ThunderKittens now supports writing multi-GPU kernels, with the same programming model and full compatibility with PyTorch + torchrun. We’re also releasing collective ops and fused multi-GPU GEMM kernels, up to 2.6x faster than PyTorch + NCCL.
5
42
364
Life update: I started Radical Numerics with Stefano Massaroli, Armin Thomas, Eric Nguyen, and a fantastic team of engineers and researchers. We are building the engine for recursive self‑improvement (RSI): AI that designs and refines AI, accelerating discovery across science and
8
23
232
✨ Excited to share a few life updates! 🎤 My TED Talk is now live! I shared the origin story of Evo, titled: "How AI could generate new life forms" TED talk: https://t.co/dh7iWcPaBu ✍️ I wrote a blog post about what it’s *really* like to deliver a TED talk blog:
ted.com
If DNA is just a string of letters, could AI learn to read it … or even write it? Bioengineering researcher Eric Nguyen reveals how AI has upended the rules of biology, potentially creating a future...
17
28
174
‼️🕚New paper alert with @ushabhalla_: Leveraging the Sequential Nature of Language for Interpretability ( https://t.co/VCNjWY6gtK)! 1/n
1
8
17
Was extremely fun to work on this paper with @jerrywliu and finally fulfilling our 7 year plan from year one of undergrad to write a paper together! One of many I hope!
1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:
2
7
40
@BaigYasa @rajat_vd @HazyResearch 11/10 BWLer was just presented at the Theory of AI for Scientific Computing (TASC) workshop at COLT 2025, where it received Best Paper 🏆 Huge thanks to the organizers (@nmboffi, @khodakmoments, Jianfeng Lu, @__tm__157, @risteski_a) for a fantastic event!
0
6
31
10/10 BWLer is just the beginning – we're excited to build precise, generalizable ML models for PDEs & physics! 📄 Paper: https://t.co/IHJDMbNjzo 🧠 Blog: https://t.co/5xpUwZi7cZ 💻 Code: https://t.co/lDWyJm8p74 w/ @BaigYasa, Denise Lee, @rajat_vd, Atri Rudra, @HazyResearch
github.com
Official repo for BWLer: Barycentric Weight Layer. Contribute to HazyResearch/bwler development by creating an account on GitHub.
1
1
35
9/10 Of course, there’s no free lunch. Like spectral methods, BWLer struggles with discontinuities or irregular domains – sometimes taking hours to match the RMSE of PINNs that train in minutes. We view BWLer as a proof-of-concept toward high-precision scientific ML! 🔬
1
0
18
8/10 Explicit BWLer can go even further 🚀 With a second-order optimizer, it reaches 10⁻¹² RMSE – near float64 machine precision! – and up to 10 billion× lower error than standard MLPs. See comparison across benchmark PDEs ⬇️
1
0
20
7/10 Adding BWLer-hats 🎩 to standard MLPs improves RMSE by up to 1800× across benchmark PDEs (convection, reaction, wave). Why? BWLer’s global derivatives encourage smoother, more coherent solutions. Example below: standard MLP vs BWLer-hatted on the convection equation ⬇️
1
1
17
6/10 BWLer comes in two modes: – BWLer-hat 🎩: adds an interpolation layer atop an NN – Explicit BWLer 🎳: replaces the NN, learns function values directly Both versions let us explicitly tune expressivity and conditioning – yielding big precision gains on benchmark PDEs 📈
1
1
19
5/10 💡If polynomials work so well, why not use them for PINNs? We introduce BWLer 🎳, a drop-in module for physics-informed learning. Built on barycentric polynomials, BWLer wraps or replaces MLPs with a numerically stable interpolant rooted in classical spectral methods.
1
2
29
4/10 Turns out, MLPs struggle 😬– even on simple sinusoids. Despite 1000× more parameters, they plateau far above machine precision, with RMSE up to 10,000× worse than basic polynomial interpolants!
1
1
18
3/10 We strip away the PDE constraints and ask a simpler question: how well can MLPs perform basic interpolation? 🧩 E.g. can MLPs recover the black curve just from the red training points? (Pictured: f(x) = sin(4x).)
1
1
17