richardczl Profile Banner
Richard Chen Profile
Richard Chen

@richardczl

Followers
185
Following
504
Media
8
Statuses
114

PhD @Stanford| Prev @Cornell | AI for Energy | @Lmsysorg Contributor

Stanford, CA
Joined May 2022
Don't wanna be here? Send us removal request.
@richardczl
Richard Chen
24 days
SGLang → 20K stars 🎉 Yesterday we shipped diffusion support. Today we hit this milestone. Honored to contribute to the DGX blog and work with this team. 20,000 developers chose SGLang because it's fast, stable, and proven - and that's a team effort. Thank you to everyone who
Tweet card summary image
github.com
SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang
0
7
28
@richardczl
Richard Chen
4 days
6B parameters delivering competitive quality at a fraction of the compute cost. Z-image integration shows SGLang’s commitment to efficient inference across modalities - not just LLMs.
@lmsysorg
LMSYS Org
4 days
SGLang diffusion now supports Z-image, an efficient 6B image-generation model with excellent performance!
0
0
3
@richardczl
Richard Chen
6 days
The future of RL isn't just about bigger models—it's about training them without breaking. Unified FP8 is how we get there.
@lmsysorg
LMSYS Org
6 days
🔥SGLang finally implemented end-to-end FP8 Training & Inference for RL. Say goodbye to the stability headaches of mixed precision! Our Unified FP8 approach (now fully supported in Miles) wipes out quantization error inconsistency, leading to significantly lower TIS-clipfrac and
0
0
4
@richardczl
Richard Chen
8 days
I learned the most from working with cool people in opensource community!
@lmsysorg
LMSYS Org
8 days
🎯SGLang RL Team is growing! We’re building a state-of-the-art RL ecosystem for LLMs and VLMs — fast, open, and production-ready. If you’re excited about RLHF, scalable training systems, and shaping how future LLMs are aligned, now is the right time to join. We are looking for
0
1
3
@lmsysorg
LMSYS Org
12 days
Excited to announce our deployment guide with @UnslothAI 🦥⚡ Unsloth gives you 2x faster fine-tuning. SGLang gives you efficient production serving. Together = complete LLM workflow: ✅ Fast training with Unsloth ✅ FP8 conversion & quantization ✅ Production deployment
3
7
37
@richardczl
Richard Chen
13 days
A journey of a thousand Miles begins with a single rollout!
@lmsysorg
LMSYS Org
14 days
🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training & production, forked from slime. Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new
0
0
1
@lmsysorg
LMSYS Org
14 days
🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training & production, forked from slime. Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new
7
34
260
@richardczl
Richard Chen
14 days
Exciting case study!
@charles_irl
Charles 🎉 Frye @ NeurIPS
14 days
Never block the GPU! In a new @modal blogpost, we walk through a major class of inefficiency in AI inference: host overhead. We include three cases where we worked with @sgl_project to cut host overhead and prevent GPU stalls. Every microsecond counts. https://t.co/ZeumrZpSKE
0
0
2
@richardczl
Richard Chen
15 days
Congrats!!! Finally out @jisong_learning
@jisong_learning
Jisong
15 days
Say hi to BeFreed - the world’s cutest audio agent that lets you learn anything through personalized audio. This video is 96% human generated. (1/3)
1
0
2
@lmsysorg
LMSYS Org
19 days
Thanks to the Modal team for adopting SGLang in production for Decagon’s real-time voice AI. We’re glad to support your large-scale serving workloads, and appreciate the improvements you’ve contributed back to the community. Looking forward to more collaboration!
@modal
Modal
20 days
Real-time voice AI isn’t easy. It needs sub-second latency, natural turn-taking, and conversational quality all at once. Here’s how @DecagonAI and Modal built a real-time inference system using: • Supervised fine-tuning and reinforcement learning • Speculative decoding with
0
3
14
@richardczl
Richard Chen
20 days
Love it!
@robertnishihara
Robert Nishihara
2 months
If you're curious why LLM inference is different from regular model inference (and why we've seen so much investment in specialized LLM inference engines like vLLM, SGLang, and TensorRT-LLM), I had a lot of fun talking through some of the main ideas here.
0
1
2
@richardczl
Richard Chen
20 days
A lot more can be done!
@lmsysorg
LMSYS Org
20 days
Totally. SGLang's design choices directly improve IPW: 1. RadixAttention for KV cache reuse 2. Fast structured generation 3. Optimized scheduling Every efficiency gain = more intelligence per watt = more AI running locally
0
0
1
@synthwavedd
leo 🐾
20 days
- no benchmarks - no API release - system card looks sloppy at best and is weirdly short - was planned for late nov maybe their weirdest new model launch ever? why did they randomly rush it out..?
@synthwavedd
leo 🐾
20 days
What the fuck lmao they dropped GPT-5.1 out of nowhere with no benchmarks in sight Are they trying to beat someone to the punch
54
22
944
@lm_zheng
Lianmin Zheng
25 days
Future models will be multi modal in multi modal out, potentially combining auto regressive and diffusion architectures. SGLang project takes the first step towards building a unified inference stack for all.
@lmsysorg
LMSYS Org
25 days
🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API
5
23
212
@richardczl
Richard Chen
25 days
On the way for unified inference for multimodality! 💪
@lmsysorg
LMSYS Org
25 days
SGLang Diffusion brings SGLang’s state-of-the-art performance to image & video generation. It supports the most popular diffusion models while providing 1.2×–5.9× speedup and a seamless interface for every developer. Dive into the full blog to see how we made SGLang Diffusion
0
0
1
@richardczl
Richard Chen
25 days
🎉🎉🎉
@lmsysorg
LMSYS Org
25 days
🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API
0
0
1
@lmsysorg
LMSYS Org
27 days
Day-0 support for Kimi K2 Thinking on SGLang ⚡ The new open-source thinking-agent model pushes reasoning, coding, and multi-step tool use to new heights. Proud to collaborate with @Kimi_Moonshot to make it run seamlessly: python -m sglang.launch_server \ --model-path
@Kimi_Moonshot
Kimi.ai
27 days
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built
1
7
36
@richardczl
Richard Chen
1 month
😂
@t_blom
Tom Blomfield
1 month
I swear we’re gonna get AGI before reliable A/V setup
0
0
0
@lmsysorg
LMSYS Org
1 month
SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for
4
20
96
@timshi_ai
Tim Shi
1 month
This is very accurate. We use a lot of small (finetuned) open-source models in production as well.
@natolambert
Nathan Lambert
1 month
Airbnb CEO Brian Chesky: “We’re relying a lot on Alibaba’s Qwen model. It’s very good. It’s also fast and cheap... We use OpenAI’s latest models, but we typically don’t use them that much in production because there are faster and cheaper models.” The valley is built on Qwen?
13
30
1K