Suvinay Subramanian
@suvinay
Followers
254
Following
2
Media
61
Statuses
1K
π¨βπ» Building AI systems (TPUs) @google | π @MIT_CSAIL (Ph.D.), @iitmadras (BTech) | ποΈ Co-host the Computer Architecture Podcast | Views my own
California, USA
Joined November 2008
Starting with this exciting line of work from @tjingrant and colleagues at MIT. We tackle the question of: Can we train LLMs to parallelize autoregressive decoding automatically, backed by a performant runtime to exploit this parallelism for improved inference speedup?
Introducing Learned Asynchronous Decoding w/ friends from MIT/Google! LLM responses often have chunks of tokens that are semantically independent. We train LLMs to identify and decode them in parallel, speeding up inference by 1.46x geomean (AlpacaEval) w/ only 1.3% quality loss.
0
0
3
A short article on our #ICML2025 paper (led by @tjingrant, @ellieyhc MITxGoogle): PASTA teaches LLMs to adaptively parallelize their own decode, optimizing quality & latency in concert. No hand-crafted heuristics -> learned parallelism, with realized latency improvements on GPUs.
Asynchronous decoding: multiple LLM threads write different parts of an answer in parallel. In Feb we (MITΓGoogle) introduced PASTAβthe first async-dec method that uses policy learning to optimize latency & quality end-to-end. See us @ E-2600, East Hall A-B, Tue 11pm #ICML.
0
1
7
Scaling Laws provide a valuable lens in guiding model design and computational budgets. Our recent work extends this lens to the realm of _fine-grained_ sparsity. Check out our #ICLR2025 paper, and the thread below from lead-author @tjingrant summarizing our findings.
π£ The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab π§΅ 1/N
0
1
8
TPUs have been a key enabler for the Gemini models -- from large-scale training, to fast and cost-effective serving. Our latest generation TPUs (Ironwood) will bring more exciting compute capabilities to the fore:
Breaking news: Google is winning on every AI front. This is not just about Gemini 2.5 but about a reality that OpenAI and Anthropic fans have ignored for too long. Here's a non-exhaustive list: - Gemini 2.5 Pro is the best model in the world according to benchmarks, vibe
0
1
7
@SabaMugazambi @JeffDean @NormJouppi And finally, for those interested in more technical details, and codesign across multiple layers of the stack from hardware, circuits to software and all the way up to the datacenter:
0
0
0
@SabaMugazambi @JeffDean @NormJouppi A couple of fun videos that provide a sneak peek into TPUs and how they are plugged into our datacenters: [1] https://t.co/V43HD2SKad, [2]
1
0
0
@SabaMugazambi @JeffDean For a historical account on the journey of developing TPUs, check out @NormJouppi 's talk at SuperComputing'24:
1
0
1
@SabaMugazambi's fireside chat with @JeffDean dives into the innovative features in the latest generation of TPUs, and what's in the pipeline:
1
0
0
Together with Lisa Hsu (Meta), we have been hosting the Computer Architecture Podcast -- we recently crossed 50K downloads. Check out our latest episode with Prof. Arka Basu: https://t.co/RRg6gq7ZMa -- we discuss GPUs, but a different vantage point than AI which is all the rage.
comparchpodcast.podbean.com
A show that brings you closer to the cutting edge in computer architecture and the remarkable people behind it. Hosted by Dr. Suvinay Subramanian, who is a computer architect at Google in the Systems...
0
0
1
A couple of excellent resources on how to think about AI systems performance, parallelism, and scaling. The below is from colleagues at Google and focused on TPUs. Another resource that dropped in the past-month is the Ultra-scale Playbook from HF: https://t.co/xuE8SZIPzu.
huggingface.co
Making LLMs run efficiently can feel scary, but scaling isnβt magic, itβs math! We wanted to demystify the βsystems viewβ of LLMs and wrote a little textbook called βHow To Scale Your Modelβ which weβre releasing today. 1/n
0
0
1
In addition to the ArchReasoning Challenge, please subimit your work at the intersection of ML, Computer Architecture and Systems to the MLArchSys Workshop at ISCA'25 (Tokyo). CFP and topics in-quote.
Please consider submitting your best work. MLArchSys is the best place to showcase your work at the intersection of ML, Computer Architecture, and System. Check out the call for paper and look for new topics we included this year ππ₯ https://t.co/QiYsSFVcny 1/3
1
0
2
High-quality data is a key enabler for effective, useful, and actionable use of AI. We are working towards collecting and curating such a dataset for the computer architecture domain. Submit your favorite architecture qns to ArchReasoning Challenge ( https://t.co/5JG0DUOpta).
sites.google.com
ArchReasoning Challenge Testing the Limits of LLM Reasoning in Computer Architecture and Systems
We're excited to launch the ππ«ππ‘ππππ¬π¨π§π’π§π ππ‘ππ₯π₯ππ§π π ( https://t.co/SS4EuHt5wA). Design complex, reasoning-based questions that expose the current limitations of LLMs and contribute to the broader effort of improving AI reasoning for comp. arch. and systems.
0
0
2
Returning to Twitter/X after a decade hiatus. My excellent intern(s) at Google with whom I have had the pleasure of working, were kind enough to nudge me to help signal-boost their work. Will also try to share updates on TPUs, AI chips & systems, and computer architecture.
1
0
7
[1/3] Former President of India, A.P.J.Abdul Kalam passes away. Dr.Abdul Kalam was a rare individual -- a man of intellectual brilliance,
0
0
2
[2/3] great scientific zeal, a sagacious statesman and truly a people's President. While many will remember him for spearheading India's
0
0
1
[3/3] nuclear program, his true legacy will be inspiring several generations of Indians to dream & to work towards a brighter India. RIP.
0
0
1
Science professors need leadership training. http://t.co/zjrEeylpsK Couldn't agree more. #read
0
0
1
Two excellent (but unrelated) graphs a) On global warming http://t.co/HSGivx2z4N b) Evolution of Silicon Valley firms http://t.co/y3y1AzDpXf
0
0
1
[1/2] A world without work: http://t.co/fYxggzbeak Great article! Rather than focus on how...
0
0
1