Ying Sheng Profile
Ying Sheng

@ying11231

Followers
12K
Following
2K
Media
6
Statuses
863

@lmsysorg @sgl_project | Prev. @xAI @Stanford | Assist Prof @UCLA. (delayed) | Do it anyway | Live to fight another day

Joined April 2021
Don't wanna be here? Send us removal request.
@ying11231
Ying Sheng
10 minutes
For inference, you can send email to the same contact as RL for now: rl_team@lmsys.org
0
0
0
@richardczl
Richard Chen
23 hours
I learned the most from working with cool people in opensource community!
@lmsysorg
LMSYS Org
23 hours
🎯SGLang RL Team is growing! We’re building a state-of-the-art RL ecosystem for LLMs and VLMs — fast, open, and production-ready. If you’re excited about RLHF, scalable training systems, and shaping how future LLMs are aligned, now is the right time to join. We are looking for
0
1
2
@lotusium_pro
Lotusium
16 days
Ditch your other HEADLAMPS! This SUPER BRIGHT and WIDE RANGE LED headband is perfect for DIY jobs of all types.
6
9
109
@mingyilu123
Mingyi Lu
23 hours
We’re actively looking for strong candidates 💕
@lmsysorg
LMSYS Org
23 hours
🎯SGLang RL Team is growing! We’re building a state-of-the-art RL ecosystem for LLMs and VLMs — fast, open, and production-ready. If you’re excited about RLHF, scalable training systems, and shaping how future LLMs are aligned, now is the right time to join. We are looking for
0
1
5
@ying11231
Ying Sheng
10 hours
Hiring! SGLang RL (Inference is also open all the time)
@lmsysorg
LMSYS Org
23 hours
🎯SGLang RL Team is growing! We’re building a state-of-the-art RL ecosystem for LLMs and VLMs — fast, open, and production-ready. If you’re excited about RLHF, scalable training systems, and shaping how future LLMs are aligned, now is the right time to join. We are looking for
3
5
95
@slime_framework
slime
2 days
We just got ~100× faster GAE by borrowing ideas from chunked linear attention and turning GAE into a chunked scan problem. Code: https://t.co/FGjrNMF1Ma Detailed write-up (Chinese):
0
7
19
@ICEgov
U.S. Immigration and Customs Enforcement
3 months
Need a job? Join ICE today. ICE offers competitive salaries & benefits like health insurance and retirement plans.
11K
19K
112K
@lm_zheng
Lianmin Zheng
2 days
Probably the first open-source RL framework that targets GB300 optimizations!
@lmsysorg
LMSYS Org
6 days
🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training & production, forked from slime. Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new
2
18
180
@lmsysorg
LMSYS Org
4 days
🔥We are excited to partner with the @intel Neural Compressor team to bring AutoRound low-bit quantization (INT2 to INT8, MXFP4, NVFP4, mixed bits) directly into SGLang’s high-performance inference runtime. With this collaboration, developers can: 1. Quantize LLMs and VLMs with
1
4
35
@lmsysorg
LMSYS Org
4 days
Excited to announce our deployment guide with @UnslothAI 🦥⚡ Unsloth gives you 2x faster fine-tuning. SGLang gives you efficient production serving. Together = complete LLM workflow: ✅ Fast training with Unsloth ✅ FP8 conversion & quantization ✅ Production deployment
3
7
37
@lmsysorg
LMSYS Org
4 days
Awesome to team up with @UnslothAI ! This guide shows how to run LLMs locally with SGLang, including GGUF serving, FP8 acceleration, and production-ready deployment. A solid resource for anyone building efficient inference pipelines 👇
@UnslothAI
Unsloth AI
4 days
We made a guide on how to deploy LLMs locally with SGLang! In collab with @lmsysorg, you'll learn to: • Deploy fine-tuned LLMs for large scale production • Serve GGUFs locally • Benchmark inference speed • Use on the fly FP8 for 1.6x inference Guide: https://t.co/hxNZikSeLS
0
4
42
@UnslothAI
Unsloth AI
4 days
We made a guide on how to deploy LLMs locally with SGLang! In collab with @lmsysorg, you'll learn to: • Deploy fine-tuned LLMs for large scale production • Serve GGUFs locally • Benchmark inference speed • Use on the fly FP8 for 1.6x inference Guide: https://t.co/hxNZikSeLS
8
74
435
@ollama
ollama
5 days
Microsoft added support for @ollama to PowerToys 0.96 Now you can use Ollama for advanced clipboard management. Transform your clipboard content into any format you need! (paste content as plaintext, markdown, JSON, or various file formats.). All this can run locally!
4
29
277
@atlas_cloud_ai
Atlas Cloud
5 days
We’re excited to team up with @lmsysorg to co-host an Open Inference Night Happy Hour at NeurIPS 2025 in San Diego 🎉 Join us for an evening of good drinks and even better conversations with researchers, engineers, and founders across the AI community. Come for: - high‑signal
0
1
3
@ying11231
Ying Sheng
6 days
Miles is the RL framework we want to push for enterprise use. This story is just beginning. Lightweight, customizable, flexible, scalable, as always. ☺️
@lmsysorg
LMSYS Org
6 days
🚀 Introducing Miles — an enterprise-facing RL framework for large-scale MoE training & production, forked from slime. Slime is a lightweight, customizable RL framework that already powers real post-training pipelines and large MoE runs. Miles builds on slime but focuses on new
2
19
194
@lmsysorg
LMSYS Org
7 days
Great write-up from the Modal team! SGLang is proud to collaborate on reducing host overhead and improving inference efficiency. Every bit counts when keeping the GPU busy 😀
@charles_irl
Charles 🎉 Frye
7 days
Never block the GPU! In a new @modal blogpost, we walk through a major class of inefficiency in AI inference: host overhead. We include three cases where we worked with @sgl_project to cut host overhead and prevent GPU stalls. Every microsecond counts. https://t.co/ZeumrZpSKE
1
4
40
@xai
xAI
8 days
Introducing Grok 4.1, a frontier model that sets a new standard for conversational intelligence, emotional understanding, and real-world helpfulness. Grok 4.1 is available for free on https://t.co/AnXpIEOPEb, https://t.co/53pltyq3a4 and our mobile apps. https://t.co/Cdmv5CqSrb
Tweet card summary image
x.ai
Grok 4.1 is now available to all users on grok.com, 𝕏, and the iOS and Android apps. It is rolling out immediately in Auto mode and can be selected explicitly as “Grok 4.1” in the model picker.
2K
3K
13K
@GenAI_is_real
Chayenne Zhao
11 days
We introduce speculative decoding into the RL sampling process, achieving a significant improvement in sampling speed under appropriate batch sizes. Compared to freezing the draft model, the accepted length maintain at a high level, generating long-term stable positive gains.
4
28
220
@lmsysorg
LMSYS Org
12 days
Honored to see SGLang in @GitHub's Octoverse 2025 report on fastest-growing open source projects. 2,541% contributor growth reflects our community's shared vision for better LLM infrastructure. But metrics aside — what matters is shipping: diffusion support, performance
3
3
16
@lmsysorg
LMSYS Org
12 days
🚀 SGLang 2025 Q4 Roadmap is here! From full-feature reliability → next-gen kernel optimizations (GB300/MI350/BW FP8) → PD disaggregation, spec decoding 2.0, MoE/EP/CP refactors, HiCache, multimodal & diffusion upgrades, RL-framework integration, and day-0 support for all major
2
6
64
@ying11231
Ying Sheng
12 days
“First principle”
@GenAI_is_real
Chayenne Zhao
12 days
Taught me a bitter lesson: "Don't solve non-existent problems, and don't create problems just for the sake of a complete story." We delayed sharing our work for a whole month just to find a baseline that crashes.
0
3
58