
Jacob Austin
@jacobaustin132
Followers
5K
Following
3K
Media
36
Statuses
511
Research at @GoogleDeepMind. Currently making LLMs go fast. I also play piano and climb. NYC. Opinions my own
New York, NY
Joined March 2017
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
25
378
2K
I just stumbled across this awesome book, which covers a lot of the nitty gritty details of GPU hardware, SLURM, cloud providers, and LLM training/serving. Probably the most practical guide to the infrastructure of LLM scaling I've seen.
Got a chance to measure Maximum Achievable Matmul TFLOPS on NVIDIA B200. With each new NVIDIA generation the efficiency keeps on dropping:. A100: 86.9%.H100: 80.3%.B200: 77.6%. The updated table is here:
2
0
39
RT @koraykv: Advanced version of Gemini Deep Think (announced at #GoogleIO) using parallel inference time computation achieved gold-medal p….
deepmind.google
Our advanced model officially achieved a gold-medal level performance on problems from the International Mathematical Olympiad (IMO), the world’s most prestigious competition for young...
0
153
0
These talks are awesome and all on YouTube!.
Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:.
0
3
6
RT @nytimes: Breaking News: A Columbia student activist, a legal permanent resident, was arrested by ICE at a meeting he thought was a step….
nytimes.com
Mohsen Mahdawi, a legal permanent resident, has lived in the United States for 10 years and was arrested in Vermont. He has not been charged with a crime.
0
106
0
RT @TransluceAI: To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken….
0
65
0
RT @lmarena_ai: BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆. T….
0
415
0
RT @AmandaAskell: It's bizarre when relatively techno-utopian people are asked about how to solve declining fertility and instead of talkin….
0
128
0
RT @rdyro128523: Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attenti….
0
46
0
RT @WhiteHouse: "CONGESTION PRICING IS DEAD. Manhattan, and all of New York, is SAVED. LONG LIVE THE KING!" .–President Donald J. Trump htt….
0
10K
0
Some awesome stuff here about LLM scaling (esp. on GPUs). Their LLAMA sharding/memory diagram is great. Glad to see it becoming easier to understand scaling in the open.
The Ultra-Scale Playbook: Training LLMs on GPU Clusters. Learn how to train your own DeepSeek-V3 model using 5D parallelism, ZeRO, fast kernels, compute/comm overlap and bottlenecks with theory, interactive plots and 4000+ scaling experiments and audio! .
2
4
54
A hot take is that LLMs are bad at writing because many of the people writing SFT data are bad at writing. Tech has never cared about writing skills. .
a controversial opinion i hold deeply is that AI is not superhuman at writing (and isn't close). there are 10x and 100x human writers. here's a random excerpt from David Foster Wallace, widely agreed to be one of the greatest modern writers. if you sincerely think anything
3
0
31