Yifan Qiao Profile
Yifan Qiao

@yifandotqiao

Followers
208
Following
42
Media
1
Statuses
36

Postdoc @UCBerkeley | PhD @UCLA | GPU systems + LLM serving | A student

Joined January 2023
Don't wanna be here? Send us removal request.
@yifandotqiao
Yifan Qiao
16 days
πŸš€ End the GPU Cost Crisis Today!!! Headache with LLMs lock a whole GPU but leave capacity idle? Frustrated by your cluster's low utilization? We launch kvcached, the first library for elastic GPU sharing across LLMs. πŸ”— https://t.co/3BC7B6s2EX πŸ§΅πŸ‘‡ Why it matters:
9
53
191
@melissapan
Melissa Pan
6 days
The Sky’s Fun Committee, representing the ppl of sky, just dropped the new lab theme: βš«οΈπŸ’– Black Pink x Halloween πŸŽƒπŸ¦‡ We have: - Gru & the minions - kpop ??? πŸ«°πŸ˜‰
8
8
52
@uccl_proj
uccl_project
9 days
πŸš€ Introducing UCCL-EP: A portable, efficient Expert Parallelism framework that brings DeepEP-level GPU-driven communication with the same APIs to any cloud or hardware β€” AWS EFA, AMD GPUs, Broadcom NICs and beyond. Blog: https://t.co/d3oBVlWezZ Code: https://t.co/0UbCUYz9N9
1
5
6
@ai4research_ucb
AI-Driven Research Systems
7 days
🎯 AI found algorithm beat * NSDI'24 Best Paper * 🀯 [ADRS Blog #2] We use AI to find new spot-instance scheduling algorithms. It beats the original paper algorithm by cutting cloud costs up to 48% (average 27%), while still meeting the job deadlines! ✍️ Read the blog:
0
7
33
@NovaSkyAI
NovaSky
10 days
☁️SkyRL now runs seamlessly with SkyPilot! Let @skypilot_org handle GPU provisioning and cluster setup, so you can focus on RL training with SkyRL. 🎯 Launch distributed RL jobs effortlessly βš™οΈ Auto-provision GPUs across clouds πŸ€– Train your LLM agents at scale Get started
0
10
23
@yifandotqiao
Yifan Qiao
11 days
Thank you. We are actively pushing this even further. Excited to hear your experience and feedback!
@Behumbledreal
Linus
12 days
During practical scenarios, we observe significant "waste", really love the job of push the edge of optimizing and squeeze the last bit of water out of the sponge
0
0
3
@melissapan
Melissa Pan
14 days
We are launching a new ADRS blog series to showcase how AI can help systems research πŸ™Œ First up: MoE load balancing βš–οΈ AI found algorithmic + engineering optimization! Check out the details in the blog. This case study is led and done by @abmfy_. Bowen is a great undergrad
@ai4research_ucb
AI-Driven Research Systems
14 days
πŸš€ We used AI to discover a new algorithm for LLM inference, achieving a 5.0x speedup in MoE load balancing over expert-written code. ✍️ Read the details in our blog post: https://t.co/sHVRqX6wDR πŸ“„ Full paper: https://t.co/ex6AidUuwK πŸ’» Code: https://t.co/o2EVHmFMCl
1
7
40
@simon_mo_
Simon Mo
15 days
Fortunate to be part of two (!) foundation projects (@vllm_project and @raydistributed) that have great synergy with each other. The Ray + vLLM + PyTorch stack is coming together. Congratulations, Ray!
@PyTorch
PyTorch
15 days
We’re excited to welcome Ray to the PyTorch Foundation πŸ‘‹ @raydistributed is an open source distributed computing framework for #AI workloads, including data processing, model training and inference at scale. By contributing Ray to the @PyTorch Foundation, @anyscalecompute
0
11
91
@yifandotqiao
Yifan Qiao
15 days
Many thanks! Would love to hear how it goes when you try it
@darsh2950
utd.dx
15 days
@yifandotqiao love it!! will give it a shot
0
0
0
@yifandotqiao
Yifan Qiao
15 days
Thank you! Would love to hear your thoughts once you try it. Feel free to open an issue or share feedback anytime πŸ™Œ
@zhouwenmeng
Wenmeng Zhou
15 days
@yifandotqiao congratulations! nice work. I love the way to use it by just piping install it without any code change. will try it definitely
0
0
4
@yifandotqiao
Yifan Qiao
15 days
Many thanks for sharing @lmsysorg. We are working hard to bring more features and hardware support to kvcached. KV cache sharing enables many possibilities beyond just the across model case, and we would love to see the community try it out.
@lmsysorg
LMSYS Org
16 days
kvcached enables elastic GPU sharing, and it works out-of-the-box with SGLang ⚑️ Higher utilization, faster serving, zero code change. Come try it
0
2
9
@xalg_ai
xAlg-ai
27 days
Excited to share our new research: vAttention - Verified Sparse Attention. Sparse attention with provable quality guarantees for LLMs. Full paper: https://t.co/pvOSEI8E7J Gibhub: xAlg-ai/sparse-attention-hub 🧡 A thread πŸ‘‡
Tweet card summary image
arxiv.org
State-of-the-art sparse attention methods for reducing decoding latency fall into two main categories: approximate top-$k$ (and its extension, top-$p$) and recently introduced sampling-based...
1
9
15
@sijun_tan
Sijun Tan
20 days
I am incredibly excited to introduce rLLM v0.2. Zooming back to a year ago: @OpenAI's o1-preview just dropped, and RL + test-time scaling suddenly became the hype. But no one knew how they did it. @kylepmont and I had this idea - what if we built a solver-critique loop for
@rllm_project
rLLM
20 days
πŸš€ Introducing rLLM v0.2 - train arbitrary agentic programs with RL, with minimal code changes. Most RL training systems adopt the agent-environment abstraction. But what about complex workflows? Think solver-critique pairs collaborating, or planner agents orchestrating multiple
8
33
304
@wenjie_ma
Wenjie Ma
20 days
LLMs solving math benchmarks with verifiable answers like AIME? βœ… LLMs solving math proofs? ❌ Still an open problem. RL works great for final-answer problems, but proofs are different: - Often no single checkable answer - Correct answers can hide flawed reasoning The key
9
37
188
@karpathy
Andrej Karpathy
17 days
I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter. The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language
@vllm_project
vLLM
18 days
πŸš€ DeepSeek-OCR β€” the new frontier of OCR from @deepseek_ai , exploring optical context compression for LLMs, is running blazingly fast on vLLM ⚑ (~2500 tokens/s on A100-40G) β€” powered by vllm==0.8.5 for day-0 model support. 🧠 Compresses visual contexts up to 20Γ— while keeping
568
2K
13K
@ainewsbites
AI News Bites
16 days
New service for GPU sharing across LLMs Introducing kvcached Elastic GPU sharing for LLMs 1 GPU = Multiple LLMs hosted πŸ‘‡
@yifandotqiao
Yifan Qiao
16 days
πŸš€ End the GPU Cost Crisis Today!!! Headache with LLMs lock a whole GPU but leave capacity idle? Frustrated by your cluster's low utilization? We launch kvcached, the first library for elastic GPU sharing across LLMs. πŸ”— https://t.co/3BC7B6s2EX πŸ§΅πŸ‘‡ Why it matters:
0
1
3
@yifandotqiao
Yifan Qiao
16 days
Thanks a lot for sharing @vllm_project! Thrilled to see our effort on out-of-the-box support paying off. We will keep pushing kvcached forward together with the community, with better performance, richer features, and broader GPU platform support πŸ”₯
@vllm_project
vLLM
16 days
kvcached works directly with vLLM and you can use it to serve multiple models on the same GPU. They will share unused KV cache blocks. Check it out!
1
0
4
@tsunghan_wu
Tsung-Han (Patrick) Wu
16 days
Humans handle dynamic situations easily, what about models? Turns out, they break in three distinct ways: β›” Force Stop β†’ Reasoning leakage (won’t stop) ⚑️ Speedup β†’ Panic (rushed answers) ❓ Info Updates β†’ Self-doubt (reject updates) πŸ‘‰Check out https://t.co/wKrnsMkiFY
5
20
66
@yifandotqiao
Yifan Qiao
16 days
(7/N) Incredibly grateful to the team: @jiarong_Xing, @yifandotqiao, @shanyu_ucla, Xingqi, Mingyuan, Yangmin, @profjoeyg, @istoica05 and other contributors. We're also warmly inviting collaborators to join us in building the foundations of elastic GPU infrastructure.
0
0
8
@yifandotqiao
Yifan Qiao
16 days
(6/N) Our vision At Berkeley’s Sky Computing Lab @BerkeleySky, we are working towards a GPU "operating system", where compute and memory are dynamically and flexibly shared across models, workloads, and even users.
1
0
9
@yifandotqiao
Yifan Qiao
16 days
(5/N) Please check out our blog for the full story, technical details, and results: πŸ“„
yifanqiao.notion.site
β€” A library to enable virtualized, elastic KV cache for LLM serving on shared GPUs
1
2
15