
Siddhant Ray
@siddhantrayyy
Followers
131
Following
59
Media
2
Statuses
117
Mainly Networks and Systems. Some Machine Learning. PhD CS candidate @ UChicago. MSc. EEIT @ ETH Zurich. BTech. ECE @ VIT Vellore.
Joined March 2016
RT @lmcache: LMCache supports gpt-oss (20B/120B) on Day 1!. TTFT 1.20s โ 0.39s (-67.5%), finish time 15.70s โ 7.73s (-50.7%) compared to Vaโฆ.
0
9
0
RT @lmcache: Everyone is focused on faster LLM inference engines. But bigger potentials might be reached with what is beyond the engine. ๐โฆ.
0
14
0
This is joint work carried out between.@UChicago/@lmcache( @astrogu_ , @this_will_echo , @shaoting_feng , @JunchenJiang ), @Princeton ( @ruipeterpan , Ravi Netravali ) and @microsoft (Ganesh Ananthanarayanan).
0
0
0
RT @this_will_echo: ๐คฏ Believe it or not, even when an LLM generates just ONE SINGLE word, it can still be powerful!. Say in recommendation:โฆ.
0
7
0
RT @lmcache: ๐๐ ๐๐ฎ๐ฐ๐ต๐ฒ ๐ฟ๐ฒ๐ฎ๐ฐ๐ต๐ฒ๐ ๐ฎ,๐ฌ๐ฌ๐ฌ+ ๐๐๐ฎ๐ฟ๐ ๐ผ๐ป ๐๐ถ๐๐๐๐ฏ! ๐ . A huge thank you to our open-source communityโyour support is fueling nextโgen effโฆ.
0
5
0
RT @AlwaysBiMySide: Even NVIDIA Dynamo thinks that letting LLM do prefill only is useful! Just sayin'. Our PrefillOnly paper mightโve beenโฆ.
0
3
0
RT @fxgst: Are you going to the World Computer Summit next week in Zurich? ๐.Then donโt miss the demo of ICP Ninja! ๐ฅท.Sign up here: https:/โฆ.
0
6
0
RT @RedHat_AI: LLM inference is too slow, too expensive, and too hard to scale. ๐จ Introducing llm-d, a Kubernetes-native distributed inferโฆ.
0
89
0
RT @lmcache: ๐ LMCache turbocharges vLLM, KServe & Dynamo!. Our new blog reveals how this SOTA KV cache layer slashes LLM inference costs &โฆ.
0
2
0
RT @lmcache: ๐๐ ๐ผ๐ผ๐ป๐ฐ๐ฎ๐ฐ๐ธ๐ฒ X ๐๐ ๐๐ฎ๐ฐ๐ต๐ฒ: KV Cache-centric Language Model Serving ๐. We're thrilled to announce a strategic collaboration betweenโฆ.
0
10
0
RT @lmcache: ๐ ๐ง๐ฒ๐ป๐ฐ๐ฒ๐ป๐ x ๐๐ ๐๐ฎ๐ฐ๐ต๐ฒ Collaboration: Integrating ๐ ๐ผ๐ผ๐ป๐ฐ๐ฎ๐ธ๐ฒ Store for Enhanced LLM Inference Caching! ๐ฅฎ๐ฅฎ. Excited to share insightโฆ.
0
9
0
RT @lmcache: ๐ ๐๐ ๐๐ฎ๐ฐ๐ต๐ฒ Powers Up ๐๐๐๐ ๐ฉ๐ญ: P/D Disaggregation & NIXL Support!. vLLM V1 revolutionized LLM serving, but lacked a dedicated KVโฆ.
0
11
0
RT @lmcache: ๐ Exciting news from #EuroSys2025: Our work on CacheBlend won Best Paper! ๐. CacheBlend delivers the first-ever speedup for RAโฆ.
0
8
0
RT @JunchenJiang: ๐ฅ๐ฅLOTS of papers on improving LLM prefill, but they RARELY become standard in the industry and open-source community. Giโฆ.
github.com
vLLMโs reference system for K8S-native cluster-wide deployment with community-driven performance optimization - vllm-project/production-stack
0
1
0
Amazing effort, please check it out!.
๐ We're thrilled to announce vLLM Production Stackโan open-source, Enterprise-Grade LLM inference solution that is now an official first-party ecosystem project under vLLM!. Why does this matter?.A handful of companies focus on LLM training, but millions of apps and businesses
0
0
1
RT @lmcache: ๐ Deploy your efficient LLM inference cluster on AWS & GCP in one command with Production-Stack! . Check out the Blog (httpโฆ.
0
3
0