Jerry Liu @jerrywliu X Profile

Jerry Liu

@jerrywliu

Followers

392

Following

223

Media

14

Statuses

76

ML & numerics | ICME PhD at Stanford, @doecsgf fellow | prev @duolingo @berkeleylab @livermore_lab

https://t.co/ExhIqOkHqj

Stanford, CA

Joined May 2022

Don't wanna be here? Send us removal request.

Sam Buchanan

@_sdbuchanan

11 days

Manifold Muon stabilizes large model training, but it's expensive 💰 -- requiring an inner loop solve for each update. 💡 But you can significantly accelerate it, leading to 2.3x speedup on @thinkymachines's experiment with no performance loss! Blog and 🧵below…

5

24

268

Karan Goel

@krandiash

14 days

We've raised $100M from Kleiner Perkins, Index Ventures, Lightspeed, and NVIDIA. Today we're introducing Sonic-3 - the state-of-the-art model for realtime conversation. What makes Sonic-3 great: - Breakthrough naturalness - laughter and full emotional range - Lightning fast -

1K

8K

Radical Numerics

@RadicalNumerics

28 days

Sliding window attention (SWA) is powering frontier hybrid models for efficiency. Is there something better? Introducing Phalanx, a faster and better quality drop-in replacement for sliding window attention (SWA). Phalanx is a new family of hardware and numerics-aware windowed

12

50

203

Radical Numerics

@RadicalNumerics

1 month

Introducing RND1, the most powerful base diffusion language model (DLM) to date. RND1 (Radical Numerics Diffusion) is an experimental DLM with 30B params (3B active) with a sparse MoE architecture. We are making it open source, releasing weights, training details, and code to

104

257

1K

Sabri Eyuboglu

@EyubogluSabri

1 month

@StanfordHAI just ran this story on self-study and cartridges -- it's a really nice overview for those curious about our work

1

19

45

Benjamin F Spector

@bfspector

1 month

(1/8) We’re releasing an 8-GPU Llama-70B inference engine megakernel! Our megakernel supports arbitrary batch sizes, mixed prefill+decode, a paged KV cache, instruction pipelining, dynamic scheduling, interleaved communication, and more! On ShareGPT it’s 22% faster than SGLang.

7

50

326

Stuart Sul

@stuart_sul

2 months

(1/6) We’re happy to share that ThunderKittens now supports writing multi-GPU kernels, with the same programming model and full compatibility with PyTorch + torchrun. We’re also releasing collective ops and fused multi-GPU GEMM kernels, up to 2.6x faster than PyTorch + NCCL.

5

42

364

Michael Poli

@MichaelPoli6

3 months

Life update: I started Radical Numerics with Stefano Massaroli, Armin Thomas, Eric Nguyen, and a fantastic team of engineers and researchers. We are building the engine for recursive self‑improvement (RSI): AI that designs and refines AI, accelerating discovery across science and

8

23

232

Eric Nguyen

@exnx

3 months

✨ Excited to share a few life updates! 🎤 My TED Talk is now live! I shared the origin story of Evo, titled: "How AI could generate new life forms" TED talk: https://t.co/dh7iWcPaBu ✍️ I wrote a blog post about what it’s *really* like to deliver a TED talk blog:

ted.com

If DNA is just a string of letters, could AI learn to read it … or even write it? Bioengineering researcher Eric Nguyen reveals how AI has upended the rules of biology, potentially creating a future...

17

28

174

Alex Oesterling

@alex_oesterling

4 months

‼️🕚New paper alert with @ushabhalla_: Leveraging the Sequential Nature of Language for Interpretability ( https://t.co/VCNjWY6gtK)! 1/n

1

8

17

Yasa Baig

@BaigYasa

4 months

Was extremely fun to work on this paper with @jerrywliu and finally fulfilling our 7 year plan from year one of undergrad to write a paper together! One of many I hope!

Jerry Liu

@jerrywliu

4 months

1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:

2

7

40

Jerry Liu

@jerrywliu

4 months

@BaigYasa @rajat_vd @HazyResearch 11/10 BWLer was just presented at the Theory of AI for Scientific Computing (TASC) workshop at COLT 2025, where it received Best Paper 🏆 Huge thanks to the organizers (@nmboffi, @khodakmoments, Jianfeng Lu, @__tm__157, @risteski_a) for a fantastic event!

0

6

31

Jerry Liu

@jerrywliu

4 months

10/10 BWLer is just the beginning – we're excited to build precise, generalizable ML models for PDEs & physics! 📄 Paper: https://t.co/IHJDMbNjzo 🧠 Blog: https://t.co/5xpUwZi7cZ 💻 Code: https://t.co/lDWyJm8p74 w/ @BaigYasa, Denise Lee, @rajat_vd, Atri Rudra, @HazyResearch

github.com

Official repo for BWLer: Barycentric Weight Layer. Contribute to HazyResearch/bwler development by creating an account on GitHub.

1

35

Jerry Liu

@jerrywliu

4 months

9/10 Of course, there’s no free lunch. Like spectral methods, BWLer struggles with discontinuities or irregular domains – sometimes taking hours to match the RMSE of PINNs that train in minutes. We view BWLer as a proof-of-concept toward high-precision scientific ML! 🔬

1

0

18

Jerry Liu

@jerrywliu

4 months

8/10 Explicit BWLer can go even further 🚀 With a second-order optimizer, it reaches 10⁻¹² RMSE – near float64 machine precision! – and up to 10 billion× lower error than standard MLPs. See comparison across benchmark PDEs ⬇️

1

0

20

Jerry Liu

@jerrywliu

4 months

7/10 Adding BWLer-hats 🎩 to standard MLPs improves RMSE by up to 1800× across benchmark PDEs (convection, reaction, wave). Why? BWLer’s global derivatives encourage smoother, more coherent solutions. Example below: standard MLP vs BWLer-hatted on the convection equation ⬇️

1

17

Jerry Liu

@jerrywliu

4 months

6/10 BWLer comes in two modes: – BWLer-hat 🎩: adds an interpolation layer atop an NN – Explicit BWLer 🎳: replaces the NN, learns function values directly Both versions let us explicitly tune expressivity and conditioning – yielding big precision gains on benchmark PDEs 📈

1

19

Jerry Liu

@jerrywliu

4 months

5/10 💡If polynomials work so well, why not use them for PINNs? We introduce BWLer 🎳, a drop-in module for physics-informed learning. Built on barycentric polynomials, BWLer wraps or replaces MLPs with a numerically stable interpolant rooted in classical spectral methods.

1

2

29

Jerry Liu

@jerrywliu

4 months

4/10 Turns out, MLPs struggle 😬– even on simple sinusoids. Despite 1000× more parameters, they plateau far above machine precision, with RMSE up to 10,000× worse than basic polynomial interpolants!

1

18

Jerry Liu

@jerrywliu

4 months

3/10 We strip away the PDE constraints and ask a simpler question: how well can MLPs perform basic interpolation? 🧩 E.g. can MLPs recover the black curve just from the red training points? (Pictured: f(x) = sin(4x).)

1

17