Matthew Johnson @SingularMattrix X Profile

Matthew Johnson

@SingularMattrix

Followers

13K

Following

92K

Media

16

Statuses

2K

Researcher at Google Brain. I work on JAX (https://t.co/UGa5tGfinF).

https://t.co/SZNDjZ4gsa

Joined July 2010

Don't wanna be here? Send us removal request.

Ashish Vaswani

@ashVaswani

11 days

We frictionlessly trained on AMD GPUs and TPUs with a unified JAX framework. Our goodput for flagship runs went past 90%. @YashVanjani @mjcOhio @alokpathy @pcmonk painstakingly removed obstacles to maximize experimental velocity.

2

12

172

Jon Barron

@jon_barron

27 days

Nano Banana Pro: "Generate a diagram of a two-layer neural network in the style of Stephen Biesty"

24

68

753

Premium

@premium

4 months

Why guess when you can know?

0

1K

11K

Doris Tsao

@doristsao

1 month

Unbelievable: the famed Berkeley Math Circle is being forced to shut down due to a bureaucratic requirement where a guest lecturer giving an hour long lesson needs to be officially fingerprinted. How is fingerprinting even still a thing in the 21st century? Chancellor Lyons

dailycal.org

After 27 years, Berkeley Math Circle has shut down its flagship program, BMC-Upper, due to “stringent” new campus background check requirements, according to a statement on BMC’s website.

34

79

772

Lianmin Zheng

@lm_zheng

2 months

SGLang now has a pure Jax backend, and it runs natively on TPU!

LMSYS Org

@lmsysorg

2 months

SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for

2

5

159

Percy Liang

@percyliang

2 months

⛵Marin 32B Base (mantis) is done training! It is the best open-source base model (beating OLMo 2 32B Base) and it’s even close to the best comparably-sized open-weight base models, Gemma 3 27B PT and Qwen 2.5 32B Base. Ranking across 19 benchmarks:

20

89

598

Cameo

@BookCameo

8 days

Sleigh the season with the most personal gift around. Get them a Cameo video!

0

60

735

Sharad Vikram

@sharadvikram

2 months

TPU-style collective matmuls on GPU!

Adam Paszke

@apaszke

2 months

Want to improve GPU compute/comms overlap? We just published a new short tutorial for you! A few small changes to the Pallas:MGPU matmul kernel is all it takes to turn it into an all-gather collective matmul that overlaps NVLINK comms with local compute:

0

8

27

Adam Paszke

@apaszke

2 months

Want to improve GPU compute/comms overlap? We just published a new short tutorial for you! A few small changes to the Pallas:MGPU matmul kernel is all it takes to turn it into an all-gather collective matmul that overlaps NVLINK comms with local compute:

8

46

300

Adam Paszke

@apaszke

3 months

Curious how to write SOTA performance Blackwell matmul kernels using MGPU? We just published a short step-by-step tutorial: https://t.co/XRVX34juEz At each step, we show exactly what (small) changes are necessary to refine the kernel and the final kernel is just under 150 lines.

4

67

418

Adam Paszke

@apaszke

3 months

@jeremyphoward Luckily we have alternatives :) https://t.co/KNy1bqdqxD Just 100 lines without leaving Python and SOTA performance

github.com

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more - jax-ml/jax

1

36

Emily Riehl

@emilyriehl

4 months

Kudos to Terry Tao for this: https://t.co/NN818aZCXy

newsletter.ofthebrave.org

The “Mozart of Math” tried to stay out of politics. Then it came for his research.

23

180

1K

Jacob Austin

@jacobaustin132

4 months

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

38

521

3K

Roborock

@roborockglobal

1 day

From Dec 8 to Dec 21, Save Up to 50% on Roborock Vacuums Built for Effortless Cleaning.

0

1

49

David Hall

@dlwh

6 months

So about a month ago, Percy posted a version of this plot of our Marin 32B pretraining run. We got a lot of feedback, both public and private, that the spikes were bad. (This is a thread about how we fixed the spikes. Bear with me. )

Percy Liang

@percyliang

7 months

Marin 32B training crossed 1.5 trillion tokens today...

23

104

1K

Sasha Rush

@srush_nlp

7 months

Strong recommend for this book and the JAX/TPU docs, even if you are using Torch / GPUs. Clean notation and mental model for some challenging ideas. https://t.co/ypg3jjbWRx https://t.co/HKBSctquKq https://t.co/k7XXc9Eesg

9

159

1K

Percy Liang

@percyliang

7 months

For a rare look into how LLMs are really built, check out @dlwh's retrospective on how we trained the Marin 8B model from scratch (and outperformed Llama 3.1 8B base). It’s an honest account with all the revelations and mistakes we made along our journey. Papers are forced to

2

71

505

Percy Liang

@percyliang

7 months

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

59

222

1K

MicroSectors

@msectors

21 days

$SHNY $DULL ±3X daily resetting GLD-linked ETNs.

0

496

2K

Chung Min Kim

@ChungMinKim

7 months

Excited to introduce PyRoki ("Python Robot Kinematics"): easier IK, trajectory optimization, motion retargeting... with an open-source toolkit on both CPU and GPU

24

163

1K

Ethan Mollick

@emollick

7 months

Pretty awesome result from the new version of Gemini 2.5 I changed one line of War and Peace, inserting a sentence into Book 14, Chapter 10 (halfway through), where Princess Mary "spoke to Crab Man the superhero" Gemini 2.5 consistently found this reference among 860,000 tokens

23

72

1K

Jon Barron

@jon_barron

8 months

A thread of thoughts on radiance fields, from my keynote at 3DV: Radiance fields have had 3 distinct generations. First was NeRF: just posenc and a tiny MLP. This was slow to train but worked really well, and it was unusually compressed --- The NeRF was smaller than the images.

10

80

641

rdyro

@rdyro128523

8 months

Llama 4 inference in pure JAX! Expert/tensor parallelism with int8 quantization. Contributions welcome!

2

15

134

Roy Frostig

@froystig

10 months

A nice and concise R1 inference jax:tpu port by @rdyro128523. Good for both reading and running. Watch the repo for more.

rdyro

@rdyro128523

10 months

Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attention, expert/tensor parallelism & int8 quantization. Contributions welcome!

0

7

37