Richard Chen @richardczl X Profile

Richard Chen

@richardczl

Followers

185

Following

504

Media

8

Statuses

114

PhD @Stanford| Prev @Cornell | AI for Energy | @Lmsysorg Contributor

Stanford, CA

Joined May 2022

Don't wanna be here? Send us removal request.

Richard Chen

@richardczl

24 days

SGLang → 20K stars 🎉 Yesterday we shipped diffusion support. Today we hit this milestone. Honored to contribute to the DGX blog and work with this team. 20,000 developers chose SGLang because it's fast, stable, and proven - and that's a team effort. Thank you to everyone who

github.com

SGLang is a fast serving framework for large language models and vision language models. - sgl-project/sglang

0

7

28

Richard Chen

@richardczl

4 days

6B parameters delivering competitive quality at a fraction of the compute cost. Z-image integration shows SGLang’s commitment to efficient inference across modalities - not just LLMs.

LMSYS Org

@lmsysorg

4 days

SGLang diffusion now supports Z-image, an efficient 6B image-generation model with excellent performance!

0

3

Richard Chen

@richardczl

6 days

The future of RL isn't just about bigger models—it's about training them without breaking. Unified FP8 is how we get there.

LMSYS Org

@lmsysorg

6 days

🔥SGLang finally implemented end-to-end FP8 Training & Inference for RL. Say goodbye to the stability headaches of mixed precision! Our Unified FP8 approach (now fully supported in Miles) wipes out quantization error inconsistency, leading to significantly lower TIS-clipfrac and

0

4

Richard Chen

@richardczl

8 days

I learned the most from working with cool people in opensource community!

LMSYS Org

@lmsysorg

8 days

🎯SGLang RL Team is growing! We’re building a state-of-the-art RL ecosystem for LLMs and VLMs — fast, open, and production-ready. If you’re excited about RLHF, scalable training systems, and shaping how future LLMs are aligned, now is the right time to join. We are looking for

0

1

3

LMSYS Org

@lmsysorg

12 days

Excited to announce our deployment guide with @UnslothAI 🦥⚡ Unsloth gives you 2x faster fine-tuning. SGLang gives you efficient production serving. Together = complete LLM workflow: ✅ Fast training with Unsloth ✅ FP8 conversion & quantization ✅ Production deployment

3

7

37

Richard Chen

@richardczl

13 days

A journey of a thousand Miles begins with a single rollout!

LMSYS Org

@lmsysorg

14 days

🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training & production, forked from slime. Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new

0

1

LMSYS Org

@lmsysorg

14 days

🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training & production, forked from slime. Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new

7

34

260

Richard Chen

@richardczl

14 days

Exciting case study!

Charles 🎉 Frye @ NeurIPS

@charles_irl

14 days

Never block the GPU! In a new @modal blogpost, we walk through a major class of inefficiency in AI inference: host overhead. We include three cases where we worked with @sgl_project to cut host overhead and prevent GPU stalls. Every microsecond counts. https://t.co/ZeumrZpSKE

0

2

Richard Chen

@richardczl

15 days

Congrats!!! Finally out @jisong_learning

Jisong

@jisong_learning

15 days

Say hi to BeFreed - the world’s cutest audio agent that lets you learn anything through personalized audio. This video is 96% human generated. (1/3)

1

0

2

LMSYS Org

@lmsysorg

19 days

Thanks to the Modal team for adopting SGLang in production for Decagon’s real-time voice AI. We’re glad to support your large-scale serving workloads, and appreciate the improvements you’ve contributed back to the community. Looking forward to more collaboration!

Modal

@modal

20 days

Real-time voice AI isn’t easy. It needs sub-second latency, natural turn-taking, and conversational quality all at once. Here’s how @DecagonAI and Modal built a real-time inference system using: • Supervised fine-tuning and reinforcement learning • Speculative decoding with

0

3

14

Richard Chen

@richardczl

20 days

Love it!

Robert Nishihara

@robertnishihara

2 months

If you're curious why LLM inference is different from regular model inference (and why we've seen so much investment in specialized LLM inference engines like vLLM, SGLang, and TensorRT-LLM), I had a lot of fun talking through some of the main ideas here.

0

1

2

Richard Chen

@richardczl

20 days

A lot more can be done!

LMSYS Org

@lmsysorg

20 days

Totally. SGLang's design choices directly improve IPW: 1. RadixAttention for KV cache reuse 2. Fast structured generation 3. Optimized scheduling Every efficiency gain = more intelligence per watt = more AI running locally

0

1

leo 🐾

@synthwavedd

20 days

- no benchmarks - no API release - system card looks sloppy at best and is weirdly short - was planned for late nov maybe their weirdest new model launch ever? why did they randomly rush it out..?

leo 🐾

@synthwavedd

20 days

What the fuck lmao they dropped GPT-5.1 out of nowhere with no benchmarks in sight Are they trying to beat someone to the punch

54

22

944

Lianmin Zheng

@lm_zheng

25 days

Future models will be multi modal in multi modal out, potentially combining auto regressive and diffusion architectures. SGLang project takes the first step towards building a unified inference stack for all.

LMSYS Org

@lmsysorg

25 days

🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API

5

23

212

Richard Chen

@richardczl

25 days

On the way for unified inference for multimodality! 💪

LMSYS Org

@lmsysorg

25 days

SGLang Diffusion brings SGLang’s state-of-the-art performance to image & video generation. It supports the most popular diffusion models while providing 1.2×–5.9× speedup and a seamless interface for every developer. Dive into the full blog to see how we made SGLang Diffusion

0

1

Richard Chen

@richardczl

25 days

🎉🎉🎉

LMSYS Org

@lmsysorg

25 days

🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API

0

1

LMSYS Org

@lmsysorg

27 days

Day-0 support for Kimi K2 Thinking on SGLang ⚡ The new open-source thinking-agent model pushes reasoning, coding, and multi-step tool use to new heights. Proud to collaborate with @Kimi_Moonshot to make it run seamlessly: python -m sglang.launch_server \ --model-path

Kimi.ai

@Kimi_Moonshot

27 days

🚀 Hello, Kimi K2 Thinking! The Open-Source Thinking Agent Model is here. 🔹 SOTA on HLE (44.9%) and BrowseComp (60.2%) 🔹 Executes up to 200 – 300 sequential tool calls without human interference 🔹 Excels in reasoning, agentic search, and coding 🔹 256K context window Built

1

7

36

Richard Chen

@richardczl

1 month

😂

Tom Blomfield

@t_blom

1 month

I swear we’re gonna get AGI before reliable A/V setup

0

LMSYS Org

@lmsysorg

1 month

SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for

4

20

96

Tim Shi

@timshi_ai

1 month

This is very accurate. We use a lot of small (finetuned) open-source models in production as well.

Nathan Lambert

@natolambert

1 month

Airbnb CEO Brian Chesky: “We’re relying a lot on Alibaba’s Qwen model. It’s very good. It’s also fast and cheap... We use OpenAI’s latest models, but we typically don’t use them that much in production because there are faster and cheaper models.” The valley is built on Qwen?

13

30

1K