Suvinay Subramanian @suvinay X Profile

Suvinay Subramanian

@suvinay

Followers

254

Following

2

Media

61

Statuses

1K

👨‍💻 Building AI systems (TPUs) @google | 🎓 @MIT_CSAIL (Ph.D.), @iitmadras (BTech) | 🎙️ Co-host the Computer Architecture Podcast | Views my own

https://t.co/rGseuZeTT8

California, USA

Joined November 2008

Don't wanna be here? Send us removal request.

Suvinay Subramanian

@suvinay

8 months

Starting with this exciting line of work from @tjingrant and colleagues at MIT. We tackle the question of: Can we train LLMs to parallelize autoregressive decoding automatically, backed by a performant runtime to exploit this parallelism for improved inference speedup?

Tian Jin

@tjingrant

8 months

Introducing Learned Asynchronous Decoding w/ friends from MIT/Google! LLM responses often have chunks of tokens that are semantically independent. We train LLMs to identify and decode them in parallel, speeding up inference by 1.46x geomean (AlpacaEval) w/ only 1.3% quality loss.

0

3

Suvinay Subramanian

@suvinay

4 months

A short article on our #ICML2025 paper (led by @tjingrant, @ellieyhc MITxGoogle): PASTA teaches LLMs to adaptively parallelize their own decode, optimizing quality & latency in concert. No hand-crafted heuristics -> learned parallelism, with realized latency improvements on GPUs.

Tian Jin

@tjingrant

4 months

Asynchronous decoding: multiple LLM threads write different parts of an answer in parallel. In Feb we (MIT×Google) introduced PASTA—the first async-dec method that uses policy learning to optimize latency & quality end-to-end. See us @ E-2600, East Hall A-B, Tue 11pm #ICML.

0

1

7

Suvinay Subramanian

@suvinay

7 months

Scaling Laws provide a valuable lens in guiding model design and computational budgets. Our recent work extends this lens to the realm of _fine-grained_ sparsity. Check out our #ICLR2025 paper, and the thread below from lead-author @tjingrant summarizing our findings.

Tian Jin

@tjingrant

7 months

📣 The Journey Matters: Our #ICLR2025 paper shows how to pretrain sparse LLMs with half the size of dense LLMs while maintaining quality. We found that the average parameter count during sparse pre-training predicts quality, not final size. An MIT/Rice/Google/ISTA collab 🧵 1/N

0

1

8

Suvinay Subramanian

@suvinay

7 months

TPUs have been a key enabler for the Gemini models -- from large-scale training, to fast and cost-effective serving. Our latest generation TPUs (Ironwood) will bring more exciting compute capabilities to the fore:

Alberto Romero

@Alber_RomGar

7 months

Breaking news: Google is winning on every AI front. This is not just about Gemini 2.5 but about a reality that OpenAI and Anthropic fans have ignored for too long. Here's a non-exhaustive list: - Gemini 2.5 Pro is the best model in the world according to benchmarks, vibe

0

1

7

Suvinay Subramanian

@suvinay

7 months

@SabaMugazambi @JeffDean @NormJouppi And finally, for those interested in more technical details, and codesign across multiple layers of the stack from hardware, circuits to software and all the way up to the datacenter:

0

Suvinay Subramanian

@suvinay

7 months

@SabaMugazambi @JeffDean @NormJouppi A couple of fun videos that provide a sneak peek into TPUs and how they are plugged into our datacenters: [1] https://t.co/V43HD2SKad, [2]

1

0

Suvinay Subramanian

@suvinay

7 months

@SabaMugazambi @JeffDean For a historical account on the journey of developing TPUs, check out @NormJouppi 's talk at SuperComputing'24:

1

0

1

Suvinay Subramanian

@suvinay

7 months

@SabaMugazambi's fireside chat with @JeffDean dives into the innovative features in the latest generation of TPUs, and what's in the pipeline:

1

0

Suvinay Subramanian

@suvinay

8 months

Together with Lisa Hsu (Meta), we have been hosting the Computer Architecture Podcast -- we recently crossed 50K downloads. Check out our latest episode with Prof. Arka Basu: https://t.co/RRg6gq7ZMa -- we discuss GPUs, but a different vantage point than AI which is all the rage.

comparchpodcast.podbean.com

A show that brings you closer to the cutting edge in computer architecture and the remarkable people behind it. Hosted by Dr. Suvinay Subramanian, who is a computer architect at Google in the Systems...

0

1

Suvinay Subramanian

@suvinay

8 months

A couple of excellent resources on how to think about AI systems performance, parallelism, and scaling. The below is from colleagues at Google and focused on TPUs. Another resource that dropped in the past-month is the Ultra-scale Playbook from HF: https://t.co/xuE8SZIPzu.

huggingface.co

Jacob Austin

@jacobaustin132

9 months

Making LLMs run efficiently can feel scary, but scaling isn’t magic, it’s math! We wanted to demystify the “systems view” of LLMs and wrote a little textbook called “How To Scale Your Model” which we’re releasing today. 1/n

0

1

Suvinay Subramanian

@suvinay

8 months

In addition to the ArchReasoning Challenge, please subimit your work at the intersection of ML, Computer Architecture and Systems to the MLArchSys Workshop at ISCA'25 (Tokyo). CFP and topics in-quote.

Amir Yazdan

@ayazdanb

9 months

Please consider submitting your best work. MLArchSys is the best place to showcase your work at the intersection of ML, Computer Architecture, and System. Check out the call for paper and look for new topics we included this year 🚀🔥 https://t.co/QiYsSFVcny 1/3

1

0

2

Suvinay Subramanian

@suvinay

8 months

High-quality data is a key enabler for effective, useful, and actionable use of AI. We are working towards collecting and curating such a dataset for the computer architecture domain. Submit your favorite architecture qns to ArchReasoning Challenge ( https://t.co/5JG0DUOpta).

sites.google.com

ArchReasoning Challenge Testing the Limits of LLM Reasoning in Computer Architecture and Systems

Amir Yazdan

@ayazdanb

8 months

We're excited to launch the 𝐀𝐫𝐜𝐡𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐂𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞 ( https://t.co/SS4EuHt5wA). Design complex, reasoning-based questions that expose the current limitations of LLMs and contribute to the broader effort of improving AI reasoning for comp. arch. and systems.

0

2

Suvinay Subramanian

@suvinay

8 months

Returning to Twitter/X after a decade hiatus. My excellent intern(s) at Google with whom I have had the pleasure of working, were kind enough to nudge me to help signal-boost their work. Will also try to share updates on TPUs, AI chips & systems, and computer architecture.

1

0

7

Suvinay Subramanian

@suvinay

10 years

[1/3] Former President of India, A.P.J.Abdul Kalam passes away. Dr.Abdul Kalam was a rare individual -- a man of intellectual brilliance,

0

2

Suvinay Subramanian

@suvinay

10 years

[2/3] great scientific zeal, a sagacious statesman and truly a people's President. While many will remember him for spearheading India's

0

1

Suvinay Subramanian

@suvinay

10 years

[3/3] nuclear program, his true legacy will be inspiring several generations of Indians to dream & to work towards a brighter India. RIP.

0

1

Suvinay Subramanian

@suvinay

10 years

Science professors need leadership training. http://t.co/zjrEeylpsK Couldn't agree more. #read

0

1

Suvinay Subramanian

@suvinay

10 years

In defense of Millennials: http://t.co/ndnAf9VtBF

theatlantic.com

Though today's generation of college students and recent graduates are often portrayed as self-centered, careerist and politically dispassionate, they are really just adapting to the world they live...

0

Suvinay Subramanian

@suvinay

10 years

Two excellent (but unrelated) graphs a) On global warming http://t.co/HSGivx2z4N b) Evolution of Silicon Valley firms http://t.co/y3y1AzDpXf

0

1

Suvinay Subramanian

@suvinay

10 years

[1/2] A world without work: http://t.co/fYxggzbeak Great article! Rather than focus on how...

0

1