jacobaustin132 Profile Banner
Jacob Austin Profile
Jacob Austin

@jacobaustin132

Followers
5K
Following
3K
Media
36
Statuses
511

Research at @GoogleDeepMind. Currently making LLMs go fast. I also play piano and climb. NYC. Opinions my own

New York, NY
Joined March 2017
Don't wanna be here? Send us removal request.
@jacobaustin132
Jacob Austin
6 months
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n
Tweet media one
25
378
2K
@jacobaustin132
Jacob Austin
4 days
I just stumbled across this awesome book, which covers a lot of the nitty gritty details of GPU hardware, SLURM, cloud providers, and LLM training/serving. Probably the most practical guide to the infrastructure of LLM scaling I've seen.
@StasBekman
Stas Bekman
25 days
Got a chance to measure Maximum Achievable Matmul TFLOPS on NVIDIA B200. With each new NVIDIA generation the efficiency keeps on dropping:. A100: 86.9%.H100: 80.3%.B200: 77.6%. The updated table is here:
Tweet media one
2
0
39
@jacobaustin132
Jacob Austin
10 days
RT @khoomeik: it’s so over
Tweet media one
0
59
0
@jacobaustin132
Jacob Austin
1 month
These talks are awesome and all on YouTube!.
@percyliang
Percy Liang
1 month
Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:.
0
3
6
@jacobaustin132
Jacob Austin
2 months
An interesting phenomenon: because LLM API providers are generally not able to log or view customer traffic, jailbreaks can in theory exist undetected unless the API customer has sufficient monitoring in place.
1
0
11
@jacobaustin132
Jacob Austin
4 months
RT @nytimes: Breaking News: A Columbia student activist, a legal permanent resident, was arrested by ICE at a meeting he thought was a step….
Tweet card summary image
nytimes.com
Mohsen Mahdawi, a legal permanent resident, has lived in the United States for 10 years and was arrested in Vermont. He has not been charged with a crime.
0
106
0
@jacobaustin132
Jacob Austin
4 months
I'm glad to see at least one university has the slightest semblance of principle left.
@gtconway3d
George Conway 👊🇺🇸🔥
4 months
I’m particularly proud right now to be a graduate of Harvard College. Thank you, President Garber.
0
0
5
@jacobaustin132
Jacob Austin
4 months
RT @Miles_Brundage: 🫠.
0
3
0
@jacobaustin132
Jacob Austin
4 months
Most exciting news of the year so far!.
@pitdesi
Sheel Mohnot
4 months
Dishoom NYC 2026
Tweet media one
0
0
18
@jacobaustin132
Jacob Austin
4 months
RT @TransluceAI: To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken….
0
65
0
@jacobaustin132
Jacob Austin
4 months
RT @lmarena_ai: BREAKING: Gemini 2.5 Pro is now #1 on the Arena leaderboard - the largest score jump ever (+40 pts vs Grok-3/GPT-4.5)! 🏆. T….
0
415
0
@jacobaustin132
Jacob Austin
4 months
Anyone done the Whitney mountaineers route or the east buttress? Planning to do one or the other this in a few weeks but wanted to talk to someone who's done it.
0
0
1
@jacobaustin132
Jacob Austin
4 months
RT @AmandaAskell: It's bizarre when relatively techno-utopian people are asked about how to solve declining fertility and instead of talkin….
0
128
0
@jacobaustin132
Jacob Austin
5 months
RT @rdyro128523: Deepseek R1 inference in pure JAX! Currently on TPU, with GPU and distilled models in-progress. Features MLA-style attenti….
0
46
0
@jacobaustin132
Jacob Austin
5 months
RT @Miles_Brundage: Sycophancy is not just a problem with AI models.
0
17
0
@jacobaustin132
Jacob Austin
5 months
RT @WhiteHouse: "CONGESTION PRICING IS DEAD. Manhattan, and all of New York, is SAVED. LONG LIVE THE KING!" .–President Donald J. Trump htt….
0
10K
0
@jacobaustin132
Jacob Austin
5 months
Some awesome stuff here about LLM scaling (esp. on GPUs). Their LLAMA sharding/memory diagram is great. Glad to see it becoming easier to understand scaling in the open.
@lvwerra
Leandro von Werra
5 months
The Ultra-Scale Playbook: Training LLMs on GPU Clusters. Learn how to train your own DeepSeek-V3 model using 5D parallelism, ZeRO, fast kernels, compute/comm overlap and bottlenecks with theory, interactive plots and 4000+ scaling experiments and audio! .
Tweet media one
2
4
54
@jacobaustin132
Jacob Austin
5 months
To add a second layer to this, I think good writing is by definition more concerned with the "real world" than with words themselves. Good writing asks "what does this really look like" and finds new words for it. LLMs can't do this.
1
0
4
@jacobaustin132
Jacob Austin
5 months
A hot take is that LLMs are bad at writing because many of the people writing SFT data are bad at writing. Tech has never cared about writing skills. .
@jxmnop
jack morris
5 months
a controversial opinion i hold deeply is that AI is not superhuman at writing (and isn't close). there are 10x and 100x human writers. here's a random excerpt from David Foster Wallace, widely agreed to be one of the greatest modern writers. if you sincerely think anything
Tweet media one
Tweet media two
3
0
31