Matthew Johnson
@SingularMattrix
Followers
13K
Following
92K
Media
16
Statuses
2K
Researcher at Google Brain. I work on JAX (https://t.co/UGa5tGfinF).
Joined July 2010
We frictionlessly trained on AMD GPUs and TPUs with a unified JAX framework. Our goodput for flagship runs went past 90%. @YashVanjani @mjcOhio @alokpathy @pcmonk painstakingly removed obstacles to maximize experimental velocity.
2
12
172
Nano Banana Pro: "Generate a diagram of a two-layer neural network in the style of Stephen Biesty"
24
68
753
Unbelievable: the famed Berkeley Math Circle is being forced to shut down due to a bureaucratic requirement where a guest lecturer giving an hour long lesson needs to be officially fingerprinted. How is fingerprinting even still a thing in the 21st century? Chancellor Lyons
dailycal.org
After 27 years, Berkeley Math Circle has shut down its flagship program, BMC-Upper, due to “stringent” new campus background check requirements, according to a statement on BMC’s website.
34
79
772
SGLang now has a pure Jax backend, and it runs natively on TPU!
SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for
2
5
159
⛵Marin 32B Base (mantis) is done training! It is the best open-source base model (beating OLMo 2 32B Base) and it’s even close to the best comparably-sized open-weight base models, Gemma 3 27B PT and Qwen 2.5 32B Base. Ranking across 19 benchmarks:
20
89
598
Sleigh the season with the most personal gift around. Get them a Cameo video!
0
60
735
Want to improve GPU compute/comms overlap? We just published a new short tutorial for you! A few small changes to the Pallas:MGPU matmul kernel is all it takes to turn it into an all-gather collective matmul that overlaps NVLINK comms with local compute:
8
46
300
Curious how to write SOTA performance Blackwell matmul kernels using MGPU? We just published a short step-by-step tutorial: https://t.co/XRVX34juEz At each step, we show exactly what (small) changes are necessary to refine the kernel and the final kernel is just under 150 lines.
4
67
418
@jeremyphoward Luckily we have alternatives :) https://t.co/KNy1bqdqxD Just 100 lines without leaving Python and SOTA performance
github.com
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more - jax-ml/jax
1
1
36
Kudos to Terry Tao for this: https://t.co/NN818aZCXy
newsletter.ofthebrave.org
The “Mozart of Math” tried to stay out of politics. Then it came for his research.
23
180
1K
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
38
521
3K
From Dec 8 to Dec 21, Save Up to 50% on Roborock Vacuums Built for Effortless Cleaning.
0
1
49
So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )
23
104
1K
Strong recommend for this book and the JAX/TPU docs, even if you are using Torch / GPUs. Clean notation and mental model for some challenging ideas. https://t.co/ypg3jjbWRx
https://t.co/HKBSctquKq
https://t.co/k7XXc9Eesg
9
159
1K
For a rare look into how LLMs are really built, check out @dlwh's retrospective on how we trained the Marin 8B model from scratch (and outperformed Llama 3.1 8B base). It’s an honest account with all the revelations and mistakes we made along our journey. Papers are forced to
2
71
505
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
59
222
1K
Excited to introduce PyRoki ("Python Robot Kinematics"): easier IK, trajectory optimization, motion retargeting... with an open-source toolkit on both CPU and GPU
24
163
1K
Pretty awesome result from the new version of Gemini 2.5 I changed one line of War and Peace, inserting a sentence into Book 14, Chapter 10 (halfway through), where Princess Mary "spoke to Crab Man the superhero" Gemini 2.5 consistently found this reference among 860,000 tokens
23
72
1K
A thread of thoughts on radiance fields, from my keynote at 3DV: Radiance fields have had 3 distinct generations. First was NeRF: just posenc and a tiny MLP. This was slow to train but worked really well, and it was unusually compressed --- The NeRF was smaller than the images.
10
80
641
Llama 4 inference in pure JAX! Expert/tensor parallelism with int8 quantization. Contributions welcome!
2
15
134
A nice and concise R1 inference jax:tpu port by @rdyro128523. Good for both reading and running. Watch the repo for more.
Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attention, expert/tensor parallelism & int8 quantization. Contributions welcome!
0
7
37