Joshua Gu @astrogu_ X Profile

Joshua Gu

@astrogu_

Followers

34

Following

24

Media

2

Statuses

30

CS Phd student @MIT, @MIT_CSAIL, @MITEECS👨‍💻| @LMCache Lab | Previous: BS @UChicago. Research on AI Systems

Chicago, IL

Joined December 2023

Don't wanna be here? Send us removal request.

Joshua Gu

@astrogu_

24 days

RT @lmcache: LMCache supports gpt-oss (20B/120B) on Day 1!. TTFT 1.20s → 0.39s (-67.5%), finish time 15.70s → 7.73s (-50.7%) compared to Va….

0

9

0

Joshua Gu

@astrogu_

26 days

RT @lmcache: 🚀 Big news from LMCache Lab!. 📝 3 papers accepted at SOSP ’25 & NSDI ’26, pushing the frontier of LLM-inference efficiency:….

0

6

0

Grok

@grok

16 days

Blazing-fast image creation – using just your voice. Try Grok Imagine.

259

491

3K

Joshua Gu

@astrogu_

1 month

such cool demo videos, wonder who made these… 🤔.

LMCache Lab

@lmcache

1 month

😎Check out how LMCache excels in Multi-Turn Context Chat and RAG use cases in this brief video!

0

1

3

Joshua Gu

@astrogu_

1 month

🔥 Check it out! 🔥.

LMCache Lab

@lmcache

1 month

Want to create your own LLM Inference Endpoint on Any Cloud in seconds? . We're announcing the alpha release of LMIgnite, the one-click high-performance inference stack built for speed and scale. Powered by LMCache, vLLM, and vLLM Production Stack. 🤖 Join the alpha and

0

2

Joshua Gu

@astrogu_

2 months

Excited to share our latest work 𝗠𝗘𝗧𝗜𝗦 at #SOSP2025. This one’s special as it’s my first full CS project from start to finish—from early brainstorming and iterating on ideas to running experiments and writing the paper. Learned a ton, and perseverance finally paid off! 🚀.

Siddhant Ray

@siddhantrayyy

2 months

With RAG and agents becoming ubiquitous in LLM systems, tuning quality and performance JOINTLY is essential to achieve the best LLM quality-of-experience. Our paper at SOSP this year, addresses this exact tradeoff!🔥

0

3

7

Joshua Gu

@astrogu_

2 months

🤙.

LMCache Lab

@lmcache

2 months

The gang 🫡

0

Joshua Gu

@astrogu_

2 months

RT @lmcache: 🚨 LMCache now turbocharges multimodal models in vLLM!. By caching image-token KV pairs, repeated images now get ~100% cache hi….

0

12

0

Joshua Gu

@astrogu_

2 months

🥳🥳🥳.

LMCache Lab

@lmcache

2 months

𝗟𝗠𝗖𝗮𝗰𝗵𝗲 𝗿𝗲𝗮𝗰𝗵𝗲𝘀 𝟮,𝟬𝟬𝟬+ 𝘀𝘁𝗮𝗿𝘀 𝗼𝗻 𝗚𝗶𝘁𝗛𝘂𝗯! 🌟 . A huge thank you to our open-source community—your support is fueling next‑gen efficient LLM Inference!

0

1

Joshua Gu

@astrogu_

3 months

RT @lmcache: 🚀 LMCache X @RedHat Official Collaboration. LMCache is now a founding supporter of Red Hat's new llm-d project for scalable di….

0

13

0

Joshua Gu

@astrogu_

4 months

RT @lmcache: 🚀 LMCache turbocharges vLLM, KServe & Dynamo!. Our new blog reveals how this SOTA KV cache layer slashes LLM inference costs &….

0

6

0

Joshua Gu

@astrogu_

4 months

RT @lmcache: 🚀𝗠𝗼𝗼𝗻𝗰𝗮𝗰𝗸𝗲 X 𝗟𝗠𝗖𝗮𝗰𝗵𝗲: KV Cache-centric Language Model Serving 🚀. We're thrilled to announce a strategic collaboration between….

0

10

0

Joshua Gu

@astrogu_

4 months

RT @lmcache: 🤯 78.8% p95 Inter-Token Latency reduction with LMCache + vLLM v1 P/D support 🚀. In our previous blog, we introduced the integr….

0

8

0

Joshua Gu

@astrogu_

4 months

🔥 Tencent x @lmcache.

LMCache Lab

@lmcache

4 months

🚀 𝗧𝗲𝗻𝗰𝗲𝗻𝘁 x 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 Collaboration: Integrating 𝗠𝗼𝗼𝗻𝗰𝗮𝗸𝗲 Store for Enhanced LLM Inference Caching! 🥮🥮. Excited to share insights from a powerful collaboration between Tencent engineers and the LMCache Lab team! 🎉. With the help from Tencent Engineers,

0

Joshua Gu

@astrogu_

5 months

RT @lmcache: 🚀 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 Powers Up 𝘃𝗟𝗟𝗠 𝗩𝟭: P/D Disaggregation & NIXL Support!. vLLM V1 revolutionized LLM serving, but lacked a dedicated KV….

0

11

0

Joshua Gu

@astrogu_

5 months

RT @lmcache: 🏆 Exciting news from #EuroSys2025: Our work on CacheBlend won Best Paper! 🚀. CacheBlend delivers the first-ever speedup for RA….

0

8

0

Joshua Gu

@astrogu_

6 months

FAST!!! 🚀 Thrilled to see all the hard work pay off!.

LMCache Lab

@lmcache

6 months

Our open-source LLM cluster deployment solution is 10x faster than SOTA OSS solution. Check out the vLLM Production-Stack!🤩🤩🤩. Since Jan 2025, vLLM Production Stack has been the reference open-source vLLM inference cluster solution with advanced KV cache offloading and K8s

0

1

Joshua Gu

@astrogu_

6 months

🚀 vLLM Production Stack is here!.

LMCache Lab

@lmcache

6 months

🚀 We're thrilled to announce vLLM Production Stack—an open-source, Enterprise-Grade LLM inference solution that is now an official first-party ecosystem project under vLLM!. Why does this matter?.A handful of companies focus on LLM training, but millions of apps and businesses

0

2

Joshua Gu

@astrogu_

7 months

RT @lmcache: 🚀 Deploying LLMs in Clusters #1. Check out this step-by-step tutorial to deploy the vLLM Production Stack on a cloud VM for s….

0

7

0

Joshua Gu

@astrogu_

7 months

RT @lmcache: 🔥Meet the vLLM Official Production Stack🔥.-⚡️ 3x higher throughput & 3x faster response!.-🔧 Easy k8s deployment with helm char….

0

19

0

Joshua Gu

@astrogu_

9 months

RT @lmcache: 🚀 LMCache speeds up multi-turn conversations by 7x vs. vLLM + prefix caching!. Our secret? Efficient KV cache offloading to CP….

0

7

0