Matthew Johnson Profile
Matthew Johnson

@SingularMattrix

Followers
13K
Following
92K
Media
16
Statuses
2K

Researcher at Google Brain. I work on JAX (https://t.co/UGa5tGfinF).

Joined July 2010
Don't wanna be here? Send us removal request.
@ashVaswani
Ashish Vaswani
11 days
We frictionlessly trained on AMD GPUs and TPUs with a unified JAX framework. Our goodput for flagship runs went past 90%. @YashVanjani @mjcOhio @alokpathy @pcmonk painstakingly removed obstacles to maximize experimental velocity.
2
12
172
@jon_barron
Jon Barron
27 days
Nano Banana Pro: "Generate a diagram of a two-layer neural network in the style of Stephen Biesty"
24
68
753
@premium
Premium
4 months
Why guess when you can know?
0
1K
11K
@doristsao
Doris Tsao
1 month
Unbelievable: the famed Berkeley Math Circle is being forced to shut down due to a bureaucratic requirement where a guest lecturer giving an hour long lesson needs to be officially fingerprinted. How is fingerprinting even still a thing in the 21st century? Chancellor Lyons
Tweet card summary image
dailycal.org
After 27 years, Berkeley Math Circle has shut down its flagship program, BMC-Upper, due to “stringent” new campus background check requirements, according to a statement on BMC’s website.
34
79
772
@lm_zheng
Lianmin Zheng
2 months
SGLang now has a pure Jax backend, and it runs natively on TPU!
@lmsysorg
LMSYS Org
2 months
SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for
2
5
159
@percyliang
Percy Liang
2 months
⛵Marin 32B Base (mantis) is done training! It is the best open-source base model (beating OLMo 2 32B Base) and it’s even close to the best comparably-sized open-weight base models, Gemma 3 27B PT and Qwen 2.5 32B Base. Ranking across 19 benchmarks:
20
89
598
@BookCameo
Cameo
8 days
Sleigh the season with the most personal gift around. Get them a Cameo video!
0
60
735
@sharadvikram
Sharad Vikram
2 months
TPU-style collective matmuls on GPU!
@apaszke
Adam Paszke
2 months
Want to improve GPU compute/comms overlap? We just published a new short tutorial for you! A few small changes to the Pallas:MGPU matmul kernel is all it takes to turn it into an all-gather collective matmul that overlaps NVLINK comms with local compute:
0
8
27
@apaszke
Adam Paszke
2 months
Want to improve GPU compute/comms overlap? We just published a new short tutorial for you! A few small changes to the Pallas:MGPU matmul kernel is all it takes to turn it into an all-gather collective matmul that overlaps NVLINK comms with local compute:
8
46
300
@apaszke
Adam Paszke
3 months
Curious how to write SOTA performance Blackwell matmul kernels using MGPU? We just published a short step-by-step tutorial: https://t.co/XRVX34juEz At each step, we show exactly what (small) changes are necessary to refine the kernel and the final kernel is just under 150 lines.
4
67
418
@jacobaustin132
Jacob Austin
4 months
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
38
521
3K
@roborockglobal
Roborock
1 day
From Dec 8 to Dec 21, Save Up to 50% on Roborock Vacuums Built for Effortless Cleaning.
0
1
49
@dlwh
David Hall
6 months
So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )
@percyliang
Percy Liang
7 months
Marin 32B training crossed 1.5 trillion tokens today...
23
104
1K
@srush_nlp
Sasha Rush
7 months
Strong recommend for this book and the JAX/TPU docs, even if you are using Torch / GPUs. Clean notation and mental model for some challenging ideas. https://t.co/ypg3jjbWRx https://t.co/HKBSctquKq https://t.co/k7XXc9Eesg
9
159
1K
@percyliang
Percy Liang
7 months
For a rare look into how LLMs are really built, check out @dlwh's retrospective on how we trained the Marin 8B model from scratch (and outperformed Llama 3.1 8B base). It’s an honest account with all the revelations and mistakes we made along our journey. Papers are forced to
2
71
505
@percyliang
Percy Liang
7 months
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
59
222
1K
@msectors
MicroSectors
21 days
$SHNY $DULL ±3X daily resetting GLD-linked ETNs.
0
496
2K
@ChungMinKim
Chung Min Kim
7 months
Excited to introduce PyRoki ("Python Robot Kinematics"): easier IK, trajectory optimization, motion retargeting... with an open-source toolkit on both CPU and GPU
24
163
1K
@emollick
Ethan Mollick
7 months
Pretty awesome result from the new version of Gemini 2.5 I changed one line of War and Peace, inserting a sentence into Book 14, Chapter 10 (halfway through), where Princess Mary "spoke to Crab Man the superhero" Gemini 2.5 consistently found this reference among 860,000 tokens
23
72
1K
@jon_barron
Jon Barron
8 months
A thread of thoughts on radiance fields, from my keynote at 3DV: Radiance fields have had 3 distinct generations. First was NeRF: just posenc and a tiny MLP. This was slow to train but worked really well, and it was unusually compressed --- The NeRF was smaller than the images.
10
80
641
@rdyro128523
rdyro
8 months
Llama 4 inference in pure JAX! Expert/tensor parallelism with int8 quantization. Contributions welcome!
2
15
134
@froystig
Roy Frostig
10 months
A nice and concise R1 inference jax:tpu port by @rdyro128523. Good for both reading and running. Watch the repo for more.
@rdyro128523
rdyro
10 months
Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attention, expert/tensor parallelism & int8 quantization. Contributions welcome!
0
7
37