
Roy Frostig
@froystig
Followers
1K
Following
361
Media
3
Statuses
123
research scientist at @googledeepmind. co-author of JAX (https://t.co/sS9COjJPsx)
sfba
Joined April 2008
RT @rdyro128523: Llama 4 inference in pure JAX! Expert/tensor parallelism with int8 quantization. Contributions welcome! .
0
14
0
A nice and concise R1 inference jax:tpu port by @rdyro128523. Good for both reading and running. Watch the repo for more.
Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attention, expert/tensor parallelism & int8 quantization. Contributions welcome!
0
5
36
Our online book on systems principles of LLM scaling is live. We hope that it helps you make the most of your computing resources. Enjoy!.
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
0
10
73
RT @sharadvikram: We now have a guide to writing distributed communication on TPU using Pallas, written by @JustinFu769512! .
0
45
0
RT @jxbz: Modula x JAX = Modulax. @gallabytes is cracked and ported Modula into JAX in a few days. I haven't had a chance to test yet, but….
github.com
Contribute to GallagherCommaJack/modulax development by creating an account on GitHub.
0
4
0
RT @exoplaneteer: I've finally landed my first proper JAX feature since joining the team: a supported "foreign function interface", which m….
0
14
0
RT @sharadvikram: Finally got around to writing a guide for matrix multiplication on TPUs using Pallas. Check it out!. .
0
25
0
RT @apaszke: Many of you are excited about H100 attention, so it’s a good time to show you Mosaic GPU: a Python DSL for H100s. The attenti….
github.com
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more - jax-ml/jax
0
110
0
RT @_ddjohnson: Excited to share Penzai, a JAX research toolkit from @GoogleDeepMind for building, editing, and visualizing neural networks….
0
402
0
In JAX and Dex, we do automatic differentiation (AD) in a distinctive way: by "linearizing" and then "transposing" programs. We wrote up what this looks like in a model language: with Alexey Radul, @apaszke, @SingularMattrix, @DougalMaclaurin.
arxiv.org
Automatic differentiation (AD) is conventionally understood as a family of distinct algorithms, rooted in two "modes" -- forward and reverse -- which are typically presented (and implemented)...
1
65
335
RT @mblondel_ml: After 6 months of hard work, happy to share JAXopt: hardware accelerated, batchable and differentiable optimizers in JAX h….
0
104
0