Jerry Liu Profile
Jerry Liu

@jerrywliu

Followers
392
Following
223
Media
14
Statuses
76

ML & numerics | ICME PhD at Stanford, @doecsgf fellow | prev @duolingo @berkeleylab @livermore_lab

Stanford, CA
Joined May 2022
Don't wanna be here? Send us removal request.
@_sdbuchanan
Sam Buchanan
11 days
Manifold Muon stabilizes large model training, but it's expensive 💰 -- requiring an inner loop solve for each update. 💡 But you can significantly accelerate it, leading to 2.3x speedup on @thinkymachines's experiment with no performance loss! Blog and 🧵below…
5
24
268
@krandiash
Karan Goel
14 days
We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -
1K
1K
8K
@RadicalNumerics
Radical Numerics
28 days
Sliding window attention (SWA) is powering frontier hybrid models for efficiency. Is there something better? Introducing Phalanx, a faster and better quality drop-in replacement for sliding window attention (SWA). Phalanx is a new family of hardware and numerics-aware windowed
12
50
203
@RadicalNumerics
Radical Numerics
1 month
Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to
104
257
1K
@EyubogluSabri
Sabri Eyuboglu
1 month
@StanfordHAI just ran this story on self-study and cartridges -- it's a really nice overview for those curious about our work
1
19
45
@bfspector
Benjamin F Spector
1 month
(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.
7
50
326
@stuart_sul
Stuart Sul
2 months
(1/6) We’re happy to share that ThunderKittens now supports writing multi-GPU kernels, with the same programming model and full compatibility with PyTorch + torchrun. We’re also releasing collective ops and fused multi-GPU GEMM kernels, up to 2.6x faster than PyTorch + NCCL.
5
42
364
@MichaelPoli6
Michael Poli
3 months
Life update: I started Radical Numerics with Stefano Massaroli, Armin Thomas, Eric Nguyen, and a fantastic team of engineers and researchers. We are building the engine for recursive self‑improvement (RSI): AI that designs and refines AI, accelerating discovery across science and
8
23
232
@exnx
Eric Nguyen
3 months
✨ Excited to share a few life updates! 🎤 My TED Talk is now live! I shared the origin story of Evo, titled: "How AI could generate new life forms" TED talk: https://t.co/dh7iWcPaBu ✍️ I wrote a blog post about what it’s *really* like to deliver a TED talk blog:
Tweet card summary image
ted.com
If DNA is just a string of letters, could AI learn to read it … or even write it? Bioengineering researcher Eric Nguyen reveals how AI has upended the rules of biology, potentially creating a future...
17
28
174
@alex_oesterling
Alex Oesterling
4 months
‼️🕚New paper alert with @ushabhalla_: Leveraging the Sequential Nature of Language for Interpretability ( https://t.co/VCNjWY6gtK)! 1/n
1
8
17
@BaigYasa
Yasa Baig
4 months
Was extremely fun to work on this paper with @jerrywliu and finally fulfilling our 7 year plan from year one of undergrad to write a paper together! One of many I hope!
@jerrywliu
Jerry Liu
4 months
1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:
2
7
40
@jerrywliu
Jerry Liu
4 months
@BaigYasa @rajat_vd @HazyResearch 11/10 BWLer was just presented at the Theory of AI for Scientific Computing (TASC) workshop at COLT 2025, where it received Best Paper 🏆 Huge thanks to the organizers (@nmboffi, @khodakmoments, Jianfeng Lu, @__tm__157, @risteski_a) for a fantastic event!
0
6
31
@jerrywliu
Jerry Liu
4 months
10/10 BWLer is just the beginning – we're excited to build precise, generalizable ML models for PDEs & physics! 📄 Paper: https://t.co/IHJDMbNjzo 🧠 Blog: https://t.co/5xpUwZi7cZ 💻 Code: https://t.co/lDWyJm8p74 w/ @BaigYasa, Denise Lee, @rajat_vd, Atri Rudra, @HazyResearch
Tweet card summary image
github.com
Official repo for BWLer: Barycentric Weight Layer. Contribute to HazyResearch/bwler development by creating an account on GitHub.
1
1
35
@jerrywliu
Jerry Liu
4 months
9/10 Of course, there’s no free lunch. Like spectral methods, BWLer struggles with discontinuities or irregular domains – sometimes taking hours to match the RMSE of PINNs that train in minutes. We view BWLer as a proof-of-concept toward high-precision scientific ML! 🔬
1
0
18
@jerrywliu
Jerry Liu
4 months
8/10 Explicit BWLer can go even further 🚀 With a second-order optimizer, it reaches 10⁻¹² RMSE – near float64 machine precision! – and up to 10 billion× lower error than standard MLPs. See comparison across benchmark PDEs ⬇️
1
0
20
@jerrywliu
Jerry Liu
4 months
7/10 Adding BWLer-hats 🎩 to standard MLPs improves RMSE by up to 1800× across benchmark PDEs (convection, reaction, wave). Why? BWLer’s global derivatives encourage smoother, more coherent solutions. Example below: standard MLP vs BWLer-hatted on the convection equation ⬇️
1
1
17
@jerrywliu
Jerry Liu
4 months
6/10 BWLer comes in two modes: – BWLer-hat 🎩: adds an interpolation layer atop an NN – Explicit BWLer 🎳: replaces the NN, learns function values directly Both versions let us explicitly tune expressivity and conditioning – yielding big precision gains on benchmark PDEs 📈
1
1
19
@jerrywliu
Jerry Liu
4 months
5/10 💡If polynomials work so well, why not use them for PINNs? We introduce BWLer 🎳, a drop-in module for physics-informed learning. Built on barycentric polynomials, BWLer wraps or replaces MLPs with a numerically stable interpolant rooted in classical spectral methods.
1
2
29
@jerrywliu
Jerry Liu
4 months
4/10 Turns out, MLPs struggle 😬– even on simple sinusoids. Despite 1000× more parameters, they plateau far above machine precision, with RMSE up to 10,000× worse than basic polynomial interpolants!
1
1
18
@jerrywliu
Jerry Liu
4 months
3/10 We strip away the PDE constraints and ask a simpler question: how well can MLPs perform basic interpolation? 🧩 E.g. can MLPs recover the black curve just from the red training points? (Pictured: f(x) = sin(4x).)
1
1
17