froystig Profile Banner
Roy Frostig Profile
Roy Frostig

@froystig

Followers
1K
Following
361
Media
3
Statuses
123

research scientist at @googledeepmind. co-author of JAX (https://t.co/sS9COjJPsx)

sfba
Joined April 2008
Don't wanna be here? Send us removal request.
@froystig
Roy Frostig
3 months
RT @rdyro128523: Llama 4 inference in pure JAX! Expert/tensor parallelism with int8 quantization. Contributions welcome! .
0
14
0
@froystig
Roy Frostig
5 months
A nice and concise R1 inference jax:tpu port by @rdyro128523. Good for both reading and running. Watch the repo for more.
@rdyro128523
rdyro
5 months
Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attention, expert/tensor parallelism & int8 quantization. Contributions welcome!
Tweet media one
0
5
36
@froystig
Roy Frostig
6 months
RT @JeffDean: Training our most capable Gemini models relies heavily on our JAX software stack + Google's TPU hardware platforms. If you….
0
167
0
@froystig
Roy Frostig
6 months
Our online book on systems principles of LLM scaling is live. We hope that it helps you make the most of your computing resources. Enjoy!.
@jacobaustin132
Jacob Austin
6 months
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
Tweet media one
0
10
73
@froystig
Roy Frostig
11 months
RT @sharadvikram: We now have a guide to writing distributed communication on TPU using Pallas, written by @JustinFu769512! .
0
45
0
@froystig
Roy Frostig
1 year
RT @jxbz: Modula x JAX = Modulax. @gallabytes is cracked and ported Modula into JAX in a few days. I haven't had a chance to test yet, but….
Tweet media one
github.com
Contribute to GallagherCommaJack/modulax development by creating an account on GitHub.
0
4
0
@froystig
Roy Frostig
1 year
RT @exoplaneteer: I've finally landed my first proper JAX feature since joining the team: a supported "foreign function interface", which m….
0
14
0
@froystig
Roy Frostig
1 year
RT @sharadvikram: Finally got around to writing a guide for matrix multiplication on TPUs using Pallas. Check it out!. .
0
25
0
@froystig
Roy Frostig
1 year
RT @apaszke: Many of you are excited about H100 attention, so it’s a good time to show you Mosaic GPU: a Python DSL for H100s. The attenti….
Tweet media one
github.com
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more - jax-ml/jax
0
110
0
@froystig
Roy Frostig
1 year
RT @_ddjohnson: Excited to share Penzai, a JAX research toolkit from @GoogleDeepMind for building, editing, and visualizing neural networks….
0
402
0
@froystig
Roy Frostig
2 years
RT @sharadvikram: Built with JAX!.
0
25
0
@froystig
Roy Frostig
2 years
RT @dlwh: Today, I’m excited to announce the release of Levanter 1.0, our new JAX-based framework for training foundation models, which we’….
0
86
0
@froystig
Roy Frostig
3 years
There's a longer history in PL of thinking about linearity in programs, what it means, and what we can do with it (cf. linear types/logic). Hopefully distilling a big piece of AD down to linearizing stuff makes it easier to think about what programs we can differentiate, and how.
1
1
12
@froystig
Roy Frostig
3 years
To get there, we need to identify linearity *in programs*. For functions expressed in code, we want a notion of linearity that implies mathematical linearity, but that also allows a compiler to delineate and transpose things automatically.
1
0
6
@froystig
Roy Frostig
3 years
Composing the two, we get reverse-mode AD as you know it today. The implementation is simpler, disentangling perturbation from reversal.
1
0
7
@froystig
Roy Frostig
3 years
We turn these algebraic facts into algorithms: "linearization" amounts to extracting the linear computation from forward-mode. "Transposition" roughly means reversing that extracted program. These steps can be written separately, like compiler passes.
1
0
5
@froystig
Roy Frostig
3 years
A key concept is *linearity*. Differentiation forms a linear map—the Jacobian. Forward-mode AD computes that map (aka the "JVP", for Jacobian-vector product). Reverse-mode computes its transpose ("VJP").
1
0
7
@froystig
Roy Frostig
3 years
We've always done AD this way, and we wrote about it briefly before (. The new paper tries to go into more detail by working over a minimal programming language.
1
0
11
@froystig
Roy Frostig
3 years
In JAX and Dex, we do automatic differentiation (AD) in a distinctive way: by "linearizing" and then "transposing" programs. We wrote up what this looks like in a model language: with Alexey Radul, @apaszke, @SingularMattrix, @DougalMaclaurin.
arxiv.org
Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented)...
1
65
335
@froystig
Roy Frostig
4 years
RT @mblondel_ml: After 6 months of hard work, happy to share JAXopt: hardware accelerated, batchable and differentiable optimizers in JAX h….
0
104
0