astrogu_ Profile Banner
Joshua Gu Profile
Joshua Gu

@astrogu_

Followers
34
Following
24
Media
2
Statuses
30

CS Phd student @MIT, @MIT_CSAIL, @MITEECS๐Ÿ‘จโ€๐Ÿ’ป| @LMCache Lab | Previous: BS @UChicago. Research on AI Systems

Chicago, IL
Joined December 2023
Don't wanna be here? Send us removal request.
@astrogu_
Joshua Gu
24 days
RT @lmcache: LMCache supports gpt-oss (20B/120B) on Day 1!. TTFT 1.20s โ†’ 0.39s (-67.5%), finish time 15.70s โ†’ 7.73s (-50.7%) compared to Vaโ€ฆ.
0
9
0
@astrogu_
Joshua Gu
26 days
RT @lmcache: ๐Ÿš€ Big news from LMCache Lab!. ๐Ÿ“ 3 papers accepted at SOSP โ€™25 & NSDI โ€™26, pushing the frontier of LLM-inference efficiency:โ€ฆ.
0
6
0
@grok
Grok
16 days
Blazing-fast image creation โ€“ using just your voice. Try Grok Imagine.
259
491
3K
@astrogu_
Joshua Gu
1 month
such cool demo videos, wonder who made theseโ€ฆ ๐Ÿค”.
@lmcache
LMCache Lab
1 month
๐Ÿ˜ŽCheck out how LMCache excels in Multi-Turn Context Chat and RAG use cases in this brief video!
0
1
3
@astrogu_
Joshua Gu
1 month
๐Ÿ”ฅ Check it out! ๐Ÿ”ฅ.
@lmcache
LMCache Lab
1 month
Want to create your own LLM Inference Endpoint on Any Cloud in seconds? . We're announcing the alpha release of LMIgnite, the one-click high-performance inference stack built for speed and scale. Powered by LMCache, vLLM, and vLLM Production Stack. ๐Ÿค– Join the alpha and
Tweet media one
0
0
2
@astrogu_
Joshua Gu
2 months
Excited to share our latest work ๐— ๐—˜๐—ง๐—œ๐—ฆ at #SOSP2025. This oneโ€™s special as itโ€™s my first full CS project from start to finishโ€”from early brainstorming and iterating on ideas to running experiments and writing the paper. Learned a ton, and perseverance finally paid off! ๐Ÿš€.
@siddhantrayyy
Siddhant Ray
2 months
With RAG and agents becoming ubiquitous in LLM systems, tuning quality and performance JOINTLY is essential to achieve the best LLM quality-of-experience. Our paper at SOSP this year, addresses this exact tradeoff!๐Ÿ”ฅ
Tweet media one
0
3
7
@astrogu_
Joshua Gu
2 months
๐Ÿค™.
@lmcache
LMCache Lab
2 months
The gang ๐Ÿซก
Tweet media one
0
0
0
@astrogu_
Joshua Gu
2 months
RT @lmcache: ๐Ÿšจ LMCache now turbocharges multimodal models in vLLM!. By caching image-token KV pairs, repeated images now get ~100% cache hiโ€ฆ.
0
12
0
@astrogu_
Joshua Gu
2 months
๐Ÿฅณ๐Ÿฅณ๐Ÿฅณ.
@lmcache
LMCache Lab
2 months
๐—Ÿ๐— ๐—–๐—ฎ๐—ฐ๐—ต๐—ฒ ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ฒ๐˜€ ๐Ÿฎ,๐Ÿฌ๐Ÿฌ๐Ÿฌ+ ๐˜€๐˜๐—ฎ๐—ฟ๐˜€ ๐—ผ๐—ป ๐—š๐—ถ๐˜๐—›๐˜‚๐—ฏ! ๐ŸŒŸ . A huge thank you to our open-source communityโ€”your support is fueling nextโ€‘gen efficient LLM Inference!
Tweet media one
0
1
1
@astrogu_
Joshua Gu
3 months
RT @lmcache: ๐Ÿš€ LMCache X @RedHat Official Collaboration. LMCache is now a founding supporter of Red Hat's new llm-d project for scalable diโ€ฆ.
0
13
0
@astrogu_
Joshua Gu
4 months
RT @lmcache: ๐Ÿš€ LMCache turbocharges vLLM, KServe & Dynamo!. Our new blog reveals how this SOTA KV cache layer slashes LLM inference costs &โ€ฆ.
0
6
0
@astrogu_
Joshua Gu
4 months
RT @lmcache: ๐Ÿš€๐— ๐—ผ๐—ผ๐—ป๐—ฐ๐—ฎ๐—ฐ๐—ธ๐—ฒ X ๐—Ÿ๐— ๐—–๐—ฎ๐—ฐ๐—ต๐—ฒ: KV Cache-centric Language Model Serving ๐Ÿš€. We're thrilled to announce a strategic collaboration betweenโ€ฆ.
0
10
0
@astrogu_
Joshua Gu
4 months
RT @lmcache: ๐Ÿคฏ 78.8% p95 Inter-Token Latency reduction with LMCache + vLLM v1 P/D support ๐Ÿš€. In our previous blog, we introduced the integrโ€ฆ.
0
8
0
@astrogu_
Joshua Gu
4 months
๐Ÿ”ฅ Tencent x @lmcache.
@lmcache
LMCache Lab
4 months
๐Ÿš€ ๐—ง๐—ฒ๐—ป๐—ฐ๐—ฒ๐—ป๐˜ x ๐—Ÿ๐— ๐—–๐—ฎ๐—ฐ๐—ต๐—ฒ Collaboration: Integrating ๐— ๐—ผ๐—ผ๐—ป๐—ฐ๐—ฎ๐—ธ๐—ฒ Store for Enhanced LLM Inference Caching! ๐Ÿฅฎ๐Ÿฅฎ. Excited to share insights from a powerful collaboration between Tencent engineers and the LMCache Lab team! ๐ŸŽ‰. With the help from Tencent Engineers,
Tweet media one
0
0
0
@astrogu_
Joshua Gu
5 months
RT @lmcache: ๐Ÿš€ ๐—Ÿ๐— ๐—–๐—ฎ๐—ฐ๐—ต๐—ฒ Powers Up ๐˜ƒ๐—Ÿ๐—Ÿ๐—  ๐—ฉ๐Ÿญ: P/D Disaggregation & NIXL Support!. vLLM V1 revolutionized LLM serving, but lacked a dedicated KVโ€ฆ.
0
11
0
@astrogu_
Joshua Gu
5 months
RT @lmcache: ๐Ÿ† Exciting news from #EuroSys2025: Our work on CacheBlend won Best Paper! ๐Ÿš€. CacheBlend delivers the first-ever speedup for RAโ€ฆ.
0
8
0
@astrogu_
Joshua Gu
6 months
FAST!!! ๐Ÿš€ Thrilled to see all the hard work pay off!.
@lmcache
LMCache Lab
6 months
Our open-source LLM cluster deployment solution is 10x faster than SOTA OSS solution. Check out the vLLM Production-Stack!๐Ÿคฉ๐Ÿคฉ๐Ÿคฉ. Since Jan 2025, vLLM Production Stack has been the reference open-source vLLM inference cluster solution with advanced KV cache offloading and K8s
Tweet media one
0
0
1
@astrogu_
Joshua Gu
6 months
๐Ÿš€ vLLM Production Stack is here!.
@lmcache
LMCache Lab
6 months
๐Ÿš€ We're thrilled to announce vLLM Production Stackโ€”an open-source, Enterprise-Grade LLM inference solution that is now an official first-party ecosystem project under vLLM!. Why does this matter?.A handful of companies focus on LLM training, but millions of apps and businesses
Tweet media one
0
0
2
@astrogu_
Joshua Gu
7 months
RT @lmcache: ๐Ÿš€ Deploying LLMs in Clusters #1. Check out this step-by-step tutorial to deploy the vLLM Production Stack on a cloud VM for sโ€ฆ.
0
7
0
@astrogu_
Joshua Gu
7 months
RT @lmcache: ๐Ÿ”ฅMeet the vLLM Official Production Stack๐Ÿ”ฅ.-โšก๏ธ 3x higher throughput & 3x faster response!.-๐Ÿ”ง Easy k8s deployment with helm charโ€ฆ.
0
19
0
@astrogu_
Joshua Gu
9 months
RT @lmcache: ๐Ÿš€ LMCache speeds up multi-turn conversations by 7x vs. vLLM + prefix caching!. Our secret? Efficient KV cache offloading to CPโ€ฆ.
0
7
0