Suvinay Subramanian Profile
Suvinay Subramanian

@suvinay

Followers
254
Following
2
Media
61
Statuses
1K

πŸ‘¨β€πŸ’» Building AI systems (TPUs) @google | πŸŽ“ @MIT_CSAIL (Ph.D.), @iitmadras (BTech) | πŸŽ™οΈ Co-host the Computer Architecture Podcast | Views my own

California, USA
Joined November 2008
Don't wanna be here? Send us removal request.
@suvinay
Suvinay Subramanian
8 months
Starting with this exciting line of work from @tjingrant and colleagues at MIT. We tackle the question of: Can we train LLMs to parallelize autoregressive decoding automatically, backed by a performant runtime to exploit this parallelism for improved inference speedup?
@tjingrant
Tian Jin
8 months
Introducing Learned Asynchronous Decoding w/ friends from MIT/Google! LLM responses often have chunks of tokens that are semantically independent. We train LLMs to identify and decode them in parallel, speeding up inference by 1.46x geomean (AlpacaEval) w/ only 1.3% quality loss.
0
0
3
@suvinay
Suvinay Subramanian
4 months
A short article on our #ICML2025 paper (led by @tjingrant, @ellieyhc MITxGoogle): PASTA teaches LLMs to adaptively parallelize their own decode, optimizing quality & latency in concert. No hand-crafted heuristics -> learned parallelism, with realized latency improvements on GPUs.
@tjingrant
Tian Jin
4 months
Asynchronous decoding: multiple LLM threads write different parts of an answer in parallel. In Feb we (MITΓ—Google) introduced PASTAβ€”the first async-dec method that uses policy learning to optimize latency & quality end-to-end. See us @ E-2600, East Hall A-B, Tue 11pm #ICML.
0
1
7
@suvinay
Suvinay Subramanian
7 months
Scaling Laws provide a valuable lens in guiding model design and computational budgets. Our recent work extends this lens to the realm of _fine-grained_ sparsity. Check out our #ICLR2025 paper, and the thread below from lead-author @tjingrant summarizing our findings.
@tjingrant
Tian Jin
7 months
πŸ“£ The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧡 1/N
0
1
8
@suvinay
Suvinay Subramanian
7 months
TPUs have been a key enabler for the Gemini models -- from large-scale training, to fast and cost-effective serving. Our latest generation TPUs (Ironwood) will bring more exciting compute capabilities to the fore:
@Alber_RomGar
Alberto Romero
7 months
Breaking news: Google is winning on every AI front. This is not just about Gemini 2.5 but about a reality that OpenAI and Anthropic fans have ignored for too long. Here's a non-exhaustive list: - Gemini 2.5 Pro is the best model in the world according to benchmarks, vibe
0
1
7
@suvinay
Suvinay Subramanian
7 months
@SabaMugazambi @JeffDean @NormJouppi And finally, for those interested in more technical details, and codesign across multiple layers of the stack from hardware, circuits to software and all the way up to the datacenter:
0
0
0
@suvinay
Suvinay Subramanian
7 months
@SabaMugazambi @JeffDean @NormJouppi A couple of fun videos that provide a sneak peek into TPUs and how they are plugged into our datacenters: [1] https://t.co/V43HD2SKad, [2]
1
0
0
@suvinay
Suvinay Subramanian
7 months
@SabaMugazambi @JeffDean For a historical account on the journey of developing TPUs, check out @NormJouppi 's talk at SuperComputing'24:
1
0
1
@suvinay
Suvinay Subramanian
7 months
@SabaMugazambi's fireside chat with @JeffDean dives into the innovative features in the latest generation of TPUs, and what's in the pipeline:
1
0
0
@suvinay
Suvinay Subramanian
8 months
Together with Lisa Hsu (Meta), we have been hosting the Computer Architecture Podcast -- we recently crossed 50K downloads. Check out our latest episode with Prof. Arka Basu: https://t.co/RRg6gq7ZMa -- we discuss GPUs, but a different vantage point than AI which is all the rage.
Tweet card summary image
comparchpodcast.podbean.com
A show that brings you closer to the cutting edge in computer architecture and the remarkable people behind it. Hosted by Dr. Suvinay Subramanian, who is a computer architect at Google in the Systems...
0
0
1
@suvinay
Suvinay Subramanian
8 months
A couple of excellent resources on how to think about AI systems performance, parallelism, and scaling. The below is from colleagues at Google and focused on TPUs. Another resource that dropped in the past-month is the Ultra-scale Playbook from HF: https://t.co/xuE8SZIPzu.
Tweet card summary image
huggingface.co
@jacobaustin132
Jacob Austin
9 months
Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the β€œsystems view” of LLMs and wrote a little textbook called β€œHow To Scale Your Model” which we’re releasing today. 1/n
0
0
1
@suvinay
Suvinay Subramanian
8 months
In addition to the ArchReasoning Challenge, please subimit your work at the intersection of ML, Computer Architecture and Systems to the MLArchSys Workshop at ISCA'25 (Tokyo). CFP and topics in-quote.
@ayazdanb
Amir Yazdan
9 months
Please consider submitting your best work. MLArchSys is the best place to showcase your work at the intersection of ML, Computer Architecture, and System. Check out the call for paper and look for new topics we included this year πŸš€πŸ”₯ https://t.co/QiYsSFVcny 1/3
1
0
2
@suvinay
Suvinay Subramanian
8 months
High-quality data is a key enabler for effective, useful, and actionable use of AI. We are working towards collecting and curating such a dataset for the computer architecture domain. Submit your favorite architecture qns to ArchReasoning Challenge ( https://t.co/5JG0DUOpta).
sites.google.com
ArchReasoning Challenge Testing the Limits of LLM Reasoning in Computer Architecture and Systems
@ayazdanb
Amir Yazdan
8 months
We're excited to launch the π€π«πœπ‘π‘πžπšπ¬π¨π§π’π§π  π‚π‘πšπ₯π₯𝐞𝐧𝐠𝐞 ( https://t.co/SS4EuHt5wA). Design complex, reasoning-based questions that expose the current limitations of LLMs and contribute to the broader effort of improving AI reasoning for comp. arch. and systems.
0
0
2
@suvinay
Suvinay Subramanian
8 months
Returning to Twitter/X after a decade hiatus. My excellent intern(s) at Google with whom I have had the pleasure of working, were kind enough to nudge me to help signal-boost their work. Will also try to share updates on TPUs, AI chips & systems, and computer architecture.
1
0
7
@suvinay
Suvinay Subramanian
10 years
[1/3] Former President of India, A.P.J.Abdul Kalam passes away. Dr.Abdul Kalam was a rare individual -- a man of intellectual brilliance,
0
0
2
@suvinay
Suvinay Subramanian
10 years
[2/3] great scientific zeal, a sagacious statesman and truly a people's President. While many will remember him for spearheading India's
0
0
1
@suvinay
Suvinay Subramanian
10 years
[3/3] nuclear program, his true legacy will be inspiring several generations of Indians to dream & to work towards a brighter India. RIP.
0
0
1
@suvinay
Suvinay Subramanian
10 years
Science professors need leadership training. http://t.co/zjrEeylpsK Couldn't agree more. #read
0
0
1
@suvinay
Suvinay Subramanian
10 years
Two excellent (but unrelated) graphs a) On global warming http://t.co/HSGivx2z4N b) Evolution of Silicon Valley firms http://t.co/y3y1AzDpXf
0
0
1
@suvinay
Suvinay Subramanian
10 years
[1/2] A world without work: http://t.co/fYxggzbeak Great article! Rather than focus on how...
0
0
1