Siddhant Ray @siddhantrayyy X Profile

Siddhant Ray

@siddhantrayyy

Followers

131

Following

59

Media

2

Statuses

117

Mainly Networks and Systems. Some Machine Learning. PhD CS candidate @ UChicago. MSc. EEIT @ ETH Zurich. BTech. ECE @ VIT Vellore.

Joined March 2016

Don't wanna be here? Send us removal request.

Siddhant Ray

@siddhantrayyy

25 days

RT @lmcache: LMCache supports gpt-oss (20B/120B) on Day 1!. TTFT 1.20s → 0.39s (-67.5%), finish time 15.70s → 7.73s (-50.7%) compared to Va….

0

9

0

Siddhant Ray

@siddhantrayyy

1 month

RT @lmcache: Everyone is focused on faster LLM inference engines. But bigger potentials might be reached with what is beyond the engine. 🚀….

0

14

0

Grok

@grok

21 days

Generate videos in just a few seconds. Try Grok Imagine, free for a limited time.

479

851

4K

Siddhant Ray

@siddhantrayyy

2 months

RT @YichuanM: Big yes to this!.

0

1

0

Siddhant Ray

@siddhantrayyy

2 months

This is joint work carried out between.@UChicago/@lmcache( @astrogu_ , @this_will_echo , @shaoting_feng , @JunchenJiang ), @Princeton ( @ruipeterpan , Ravi Netravali ) and @microsoft (Ganesh Ananthanarayanan).

0

Siddhant Ray

@siddhantrayyy

2 months

With RAG and agents becoming ubiquitous in LLM systems, tuning quality and performance JOINTLY is essential to achieve the best LLM quality-of-experience. Our paper at SOSP this year, addresses this exact tradeoff!🔥

1

6

16

Siddhant Ray

@siddhantrayyy

2 months

RT @this_will_echo: 🤯 Believe it or not, even when an LLM generates just ONE SINGLE word, it can still be powerful!. Say in recommendation:….

0

7

0

Siddhant Ray

@siddhantrayyy

2 months

RT @lmcache: 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝟮,𝟬𝟬𝟬+ 𝘀𝘁𝗮𝗿𝘀 𝗼𝗻 𝗚𝗶𝘁𝗛𝘂𝗯! 🌟 . A huge thank you to our open-source community—your support is fueling next‑gen eff….

0

5

0

Siddhant Ray

@siddhantrayyy

2 months

RT @AlwaysBiMySide: Even NVIDIA Dynamo thinks that letting LLM do prefill only is useful! Just sayin'. Our PrefillOnly paper might’ve been….

0

3

0

Siddhant Ray

@siddhantrayyy

3 months

RT @lmcache: 🚀 LMCache X @RedHat Official Collaboration. LMCache is now a founding supporter of Red Hat's new llm-d project for scalable di….

0

13

0

Siddhant Ray

@siddhantrayyy

3 months

RT @fxgst: Are you going to the World Computer Summit next week in Zurich? 🌎.Then don’t miss the demo of ICP Ninja! 🥷.Sign up here: https:/….

0

6

0

Siddhant Ray

@siddhantrayyy

3 months

RT @RedHat_AI: LLM inference is too slow, too expensive, and too hard to scale. 🚨 Introducing llm-d, a Kubernetes-native distributed infer….

0

89

0

Siddhant Ray

@siddhantrayyy

4 months

RT @lmcache: 🚀 LMCache turbocharges vLLM, KServe & Dynamo!. Our new blog reveals how this SOTA KV cache layer slashes LLM inference costs &….

0

2

0

Siddhant Ray

@siddhantrayyy

4 months

RT @lmcache: 🚀𝗠𝗼𝗼𝗻𝗰𝗮𝗰𝗸𝗲 X 𝗟𝗠𝗖𝗮𝗰𝗵𝗲: KV Cache-centric Language Model Serving 🚀. We're thrilled to announce a strategic collaboration between….

0

10

0

Siddhant Ray

@siddhantrayyy

4 months

RT @lmcache: 🚀 𝗧𝗲𝗻𝗰𝗲𝗻𝘁 x 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 Collaboration: Integrating 𝗠𝗼𝗼𝗻𝗰𝗮𝗸𝗲 Store for Enhanced LLM Inference Caching! 🥮🥮. Excited to share insight….

0

9

0

Siddhant Ray

@siddhantrayyy

5 months

RT @lmcache: 🚀 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 Powers Up 𝘃𝗟𝗟𝗠 𝗩𝟭: P/D Disaggregation & NIXL Support!. vLLM V1 revolutionized LLM serving, but lacked a dedicated KV….

0

11

0

Siddhant Ray

@siddhantrayyy

5 months

RT @lmcache: 🏆 Exciting news from #EuroSys2025: Our work on CacheBlend won Best Paper! 🚀. CacheBlend delivers the first-ever speedup for RA….

0

8

0

Siddhant Ray

@siddhantrayyy

6 months

RT @JunchenJiang: 🔥🔥LOTS of papers on improving LLM prefill, but they RARELY become standard in the industry and open-source community. Gi….

github.com

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization - vllm-project/production-stack

0

1

0

Siddhant Ray

@siddhantrayyy

6 months

Amazing effort, please check it out!.

LMCache Lab

@lmcache

6 months

🚀 We're thrilled to announce vLLM Production Stack—an open-source, Enterprise-Grade LLM inference solution that is now an official first-party ecosystem project under vLLM!. Why does this matter?.A handful of companies focus on LLM training, but millions of apps and businesses

0

1

Siddhant Ray

@siddhantrayyy

6 months

RT @lmcache: 🚀 Deploy your efficient LLM inference cluster on AWS & GCP in one command with Production-Stack! . Check out the Blog (http….

0

3

0

Siddhant Ray

@siddhantrayyy

7 months

RT @lmcache: 🚀 Deploying LLMs in Clusters #1. Check out this step-by-step tutorial to deploy the vLLM Production Stack on a cloud VM for s….

0

7

0