hsu_byron Profile Banner
Byron Hsu Profile
Byron Hsu

@hsu_byron

Followers
4K
Following
8K
Media
148
Statuses
2K

ML system @xAI | @lmsysorg @liger_kernel @flyteorg @theASF

Joined November 2017
Don't wanna be here? Send us removal request.
@hsu_byron
Byron Hsu
1 month
This is huge!
@lmsysorg
LMSYS Org
1 month
๐Ÿ’ฅ We've achieved perfect training-inference alignment for SGLang & FSDP in slime! (Flash Attn 3, DeepGEMM, etc.) The result? A strict KL divergence of 0. But here's the twist: We spent a month trying to find a baseline that crashes from mismatch... and couldn't. ๐Ÿคทโ€โ™‚๏ธ We haven't
0
0
14
@casper_hansen_
Casper Hansen
1 month
insane blackwell progress in v0.5.5 by the sglang team. with new optimizations, it's stable like hopper and the performance is great even for multimodal models 181 tokens/s on Qwen3-VL-30B-A3B-Thinking on 1x B200:
1
8
71
@lm_zheng
Lianmin Zheng
2 months
SGLang now has a pure Jax backend, and it runs natively on TPU!
@lmsysorg
LMSYS Org
2 months
SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for
2
5
159
@curiouskid423
Kevin Li
2 months
(1/n) ๐Ÿš€ Your VLM can be a great multimodal encoder for image editing and generation if you use the middle layers wisely (yes, plural ๐Ÿ˜‰). We are thrilled to present UniFusion - the first architecture uses only VLM as input-condition encoder without auxiliary signals from VAE
2
13
23
@Diegopasini
Diego
3 months
At xAI, we are starting a new paradigm for human data. Post-training is becoming an art. Good taste matters now more than ever. High quality data is the stepping stone to AGI. We are creating a small community of savants that will work together here in Palo Alto to build the
171
176
2K
@YungSungChuang
Yung-Sung Chuang
3 months
๐ŸŽ‰ Excited to share our MetaCLIP 2 is now accepted as Spotlight at #NeurIPS2025 and the models are available on HF: ๐Ÿค— https://t.co/lVNIL1beMM Pls use it if you want CLIP with: ๐ŸŒ 1. diverse worldwide knowledge beyond English CLIP ๐Ÿ‡ฌ๐Ÿ‡ง 2. even better English ability See u in SD!
Tweet card summary image
huggingface.co
@YungSungChuang
Yung-Sung Chuang
5 months
Scaling CLIP on English-only data is outdated nowโ€ฆ ๐ŸŒWe built CLIP data curation pipeline for 300+ languages ๐Ÿ‡ฌ๐Ÿ‡งWe train MetaCLIP 2 without compromising English-task performance (it actually improves! ๐ŸฅณItโ€™s time to drop the language filter! ๐Ÿ“ https://t.co/pQuwzH053M [1/5] ๐Ÿงต
6
29
209
@hsu_byron
Byron Hsu
3 months
At xAI, we are building the worldโ€™s most advanced inference system on tens of thousands of GPUs. It has been a fun journey to support the Grok 4 Fast long-context model end-to-end, from autoscaling, disaggregated serving, to model parallelism. Please DM me or apply to the below
@xai
xAI
3 months
Introducing Grok 4 Fast, a multimodal reasoning model with a 2M context window that sets a new standard for cost-efficient intelligence. Available for free on https://t.co/AnXpIEOhOD, https://t.co/53pltypvkw, iOS and Android apps, and OpenRouter. https://t.co/3YZ1yVwueV
33
51
828
@Saaaang94
SangBin Cho
4 months
We are using SGLang at really large scale RL, and itโ€™s been working great :)
@casper_hansen_
Casper Hansen
4 months
xAI may be one of the single biggest contributors to open-source inference just by serving everything with SGLang
8
32
497
@hsu_byron
Byron Hsu
4 months
๐Ÿซก
@casper_hansen_
Casper Hansen
4 months
xAI may be one of the single biggest contributors to open-source inference just by serving everything with SGLang
1
4
200
@skcd42
skcd
4 months
Grok-code-fast-1 is now out and available for everyone to use ๐Ÿš€๐ŸŽ๏ธ๐Ÿ’จ When I joined the coding team, the team was just 3 people and we very quickly built a model which was SOTA on SWEBench. But as things go, in the real world benchmarks matter less. Over the last few months we
@xai
xAI
4 months
Introducing Grok Code Fast 1, a speedy and economical reasoning model that excels at agentic coding. Now available for free on GitHub Copilot, Cursor, Cline, Kilo Code, Roo Code, opencode, and Windsurf. https://t.co/3tMbmLbxOP
227
139
2K
@hsu_byron
Byron Hsu
4 months
Only at xAI
@LiangchenLuo
Liangchen Luo
4 months
Kudos to our crew ๐Ÿ‘ "we've got a truly marvelous group of people, which this margin is too narrow to tag them all" fun side note: happy to witness two launches in a week๐Ÿ˜† photoed in @SpaceX launch site, Starbase.
1
3
142
@chaoqi_w
Chaoqi Wang
4 months
We are hiring brilliant engineers to work on pretraining! Join us to tackle pretraining data, design cutting-edge data recipes, and build next-gen data infra. If youโ€™re driven to accelerate human discovery and ready to change the world, apply now to join our galactic mission!
Tweet card summary image
job-boards.greenhouse.io
Palo Alto, CA
11
26
372
@hsu_byron
Byron Hsu
4 months
Hardcore engineers ๐Ÿ‹๏ธโ€โ™‚๏ธ
@HeinrichKuttler
heiner
4 months
If you know kube, Linux, file systems, process scheduling, tcp/ip, ibverbs - reach out. we might have just the place for you dms open
3
5
169
@HeinrichKuttler
heiner
4 months
Sounds like there's a lot of alpha in just hiring the best. I wonder if anyone knows a place that does that?
@EWErickson
Erick Erickson
4 months
The guy in need of the job was telling me he made it to final interview rounds with multiple Fortune 500 tech companies and in each one, the final interviewer was not the would be manager, CEO, or people heโ€™d work with, but a white HR lady in California.
6
1
91
@hsu_byron
Byron Hsu
4 months
At xAI, we are managing traffic at an unprecedented scale. Our team is small, dedicated, and highly skilled. In this role, you will own a critical part of our production serving infrastructure, collaborating closely with the research inference team to ensure it is elastic,
@APrerepa
Aditya Prerepa
4 months
hi, @xai is hiring for k8s/traffic people on our supercomputing team -- you get to work with a lean and fast-moving team while shipping massive impact to many users worldwide. this is the most fun i've had at any company! please apply/DM and share. https://t.co/Mk4DvAzXrt
13
31
408
@zeeshanp_
Zeeshan Patel
4 months
Many people wonder what is the benefit of training video gen models. Video gen by itself doesnโ€™t necessarily seem to provide as much raw intelligence to users as modern LLMs. However, in the long term, video gen models will be used as neural simulations of the universe within
432
518
3K
@Guodzh
Guodong Zhang
5 months
We are actively hiring for multimodal understanding and generation. Join us to build the future AI interfaces! https://t.co/qzULWOZQEq https://t.co/DKdeK9LEAt https://t.co/z0j11Q98FA
Tweet card summary image
job-boards.greenhouse.io
Palo Alto, CA; San Francisco, CA
139
369
2K
@Guodzh
Guodong Zhang
4 months
We are hiring on pretraining as well. If you are passionate about improving training efficiency, pretraining data quality and training infra. Please apply here: https://t.co/fKK8lGPd2P https://t.co/3xVTBc5uOg https://t.co/yJS5rmdQX6
Tweet card summary image
job-boards.greenhouse.io
@Guodzh
Guodong Zhang
5 months
We are actively hiring for multimodal understanding and generation. Join us to build the future AI interfaces! https://t.co/qzULWOZQEq https://t.co/DKdeK9LEAt https://t.co/z0j11Q98FA
40
110
689