Richard Chen
@richardczl
Followers
185
Following
504
Media
8
Statuses
114
PhD @Stanford| Prev @Cornell | AI for Energy | @Lmsysorg Contributor
Stanford, CA
Joined May 2022
SGLang → 20K stars 🎉 Yesterday we shipped diffusion support. Today we hit this milestone. Honored to contribute to the DGX blog and work with this team. 20,000 developers chose SGLang because it's fast, stable, and proven - and that's a team effort. Thank you to everyone who
github.com
SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang
0
7
28
The future of RL isn't just about bigger models—it's about training them without breaking. Unified FP8 is how we get there.
🔥SGLang finally implemented end-to-end FP8 Training & Inference for RL. Say goodbye to the stability headaches of mixed precision! Our Unified FP8 approach (now fully supported in Miles) wipes out quantization error inconsistency, leading to significantly lower TIS-clipfrac and
0
0
4
I learned the most from working with cool people in opensource community!
🎯SGLang RL Team is growing! We’re building a state-of-the-art RL ecosystem for LLMs and VLMs — fast, open, and production-ready. If you’re excited about RLHF, scalable training systems, and shaping how future LLMs are aligned, now is the right time to join. We are looking for
0
1
3
Excited to announce our deployment guide with @UnslothAI 🦥⚡ Unsloth gives you 2x faster fine-tuning. SGLang gives you efficient production serving. Together = complete LLM workflow: ✅ Fast training with Unsloth ✅ FP8 conversion & quantization ✅ Production deployment
3
7
37
A journey of a thousand Miles begins with a single rollout!
🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training & production, forked from slime. Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new
0
0
1
🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training & production, forked from slime. Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new
7
34
260
Exciting case study!
Never block the GPU! In a new @modal blogpost, we walk through a major class of inefficiency in AI inference: host overhead. We include three cases where we worked with @sgl_project to cut host overhead and prevent GPU stalls. Every microsecond counts. https://t.co/ZeumrZpSKE
0
0
2
Congrats!!! Finally out @jisong_learning
Say hi to BeFreed - the world’s cutest audio agent that lets you learn anything through personalized audio. This video is 96% human generated. (1/3)
1
0
2
Thanks to the Modal team for adopting SGLang in production for Decagon’s real-time voice AI. We’re glad to support your large-scale serving workloads, and appreciate the improvements you’ve contributed back to the community. Looking forward to more collaboration!
Real-time voice AI isn’t easy. It needs sub-second latency, natural turn-taking, and conversational quality all at once. Here’s how @DecagonAI and Modal built a real-time inference system using: • Supervised fine-tuning and reinforcement learning • Speculative decoding with
0
3
14
- no benchmarks - no API release - system card looks sloppy at best and is weirdly short - was planned for late nov maybe their weirdest new model launch ever? why did they randomly rush it out..?
What the fuck lmao they dropped GPT-5.1 out of nowhere with no benchmarks in sight Are they trying to beat someone to the punch
54
22
944
Future models will be multi modal in multi modal out, potentially combining auto regressive and diffusion architectures. SGLang project takes the first step towards building a unified inference stack for all.
🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API
5
23
212
On the way for unified inference for multimodality! 💪
SGLang Diffusion brings SGLang’s state-of-the-art performance to image & video generation. It supports the most popular diffusion models while providing 1.2×–5.9× speedup and a seamless interface for every developer. Dive into the full blog to see how we made SGLang Diffusion
0
0
1
Day-0 support for Kimi K2 Thinking on SGLang ⚡ The new open-source thinking-agent model pushes reasoning, coding, and multi-step tool use to new heights. Proud to collaborate with @Kimi_Moonshot to make it run seamlessly: python -m sglang.launch_server \ --model-path
🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built
1
7
36
😂
0
0
0
SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for
4
20
96
This is very accurate. We use a lot of small (finetuned) open-source models in production as well.
Airbnb CEO Brian Chesky: “We’re relying a lot on Alibaba’s Qwen model. It’s very good. It’s also fast and cheap... We use OpenAI’s latest models, but we typically don’t use them that much in production because there are faster and cheaper models.” The valley is built on Qwen?
13
30
1K