Byron Hsu
@hsu_byron
Followers
4K
Following
8K
Media
148
Statuses
2K
ML system @xAI | @lmsysorg @liger_kernel @flyteorg @theASF
Joined November 2017
This is huge!
๐ฅ We've achieved perfect training-inference alignment for SGLang & FSDP in slime! (Flash Attn 3, DeepGEMM, etc.) The result? A strict KL divergence of 0. But here's the twist: We spent a month trying to find a baseline that crashes from mismatch... and couldn't. ๐คทโโ๏ธ We haven't
0
0
14
insane blackwell progress in v0.5.5 by the sglang team. with new optimizations, it's stable like hopper and the performance is great even for multimodal models 181 tokens/s on Qwen3-VL-30B-A3B-Thinking on 1x B200:
1
8
71
SGLang now has a pure Jax backend, and it runs natively on TPU!
SGLang now runs natively on TPU with a new pure Jax backend! SGLang-Jax leverages SGLang's high-performance server architecture and uses Jax to compile the model's forward pass. By combining SGLang and Jax, it delivers fast, native TPU inference while maintaining support for
2
5
159
(1/n) ๐ Your VLM can be a great multimodal encoder for image editing and generation if you use the middle layers wisely (yes, plural ๐). We are thrilled to present UniFusion - the first architecture uses only VLM as input-condition encoder without auxiliary signals from VAE
2
13
23
At xAI, we are starting a new paradigm for human data. Post-training is becoming an art. Good taste matters now more than ever. High quality data is the stepping stone to AGI. We are creating a small community of savants that will work together here in Palo Alto to build the
171
176
2K
๐ Excited to share our MetaCLIP 2 is now accepted as Spotlight at #NeurIPS2025 and the models are available on HF: ๐ค https://t.co/lVNIL1beMM Pls use it if you want CLIP with: ๐ 1. diverse worldwide knowledge beyond English CLIP ๐ฌ๐ง 2. even better English ability See u in SD!
huggingface.co
Scaling CLIP on English-only data is outdated nowโฆ ๐We built CLIP data curation pipeline for 300+ languages ๐ฌ๐งWe train MetaCLIP 2 without compromising English-task performance (it actually improves! ๐ฅณItโs time to drop the language filter! ๐ https://t.co/pQuwzH053M [1/5] ๐งต
6
29
209
At xAI, we are building the worldโs most advanced inference system on tens of thousands of GPUs. It has been a fun journey to support the Grok 4 Fast long-context model end-to-end, from autoscaling, disaggregated serving, to model parallelism. Please DM me or apply to the below
Introducing Grok 4 Fast, a multimodal reasoning model with a 2M context window that sets a new standard for cost-efficient intelligence. Available for free on https://t.co/AnXpIEOhOD,
https://t.co/53pltypvkw, iOS and Android apps, and OpenRouter. https://t.co/3YZ1yVwueV
33
51
828
Grok-code-fast-1 is now out and available for everyone to use ๐๐๏ธ๐จ When I joined the coding team, the team was just 3 people and we very quickly built a model which was SOTA on SWEBench. But as things go, in the real world benchmarks matter less. Over the last few months we
Introducing Grok Code Fast 1, a speedy and economical reasoning model that excels at agentic coding. Now available for free on GitHub Copilot, Cursor, Cline, Kilo Code, Roo Code, opencode, and Windsurf. https://t.co/3tMbmLbxOP
227
139
2K
Only at xAI
Kudos to our crew ๐ "we've got a truly marvelous group of people, which this margin is too narrow to tag them all" fun side note: happy to witness two launches in a week๐ photoed in @SpaceX launch site, Starbase.
1
3
142
We are hiring brilliant engineers to work on pretraining! Join us to tackle pretraining data, design cutting-edge data recipes, and build next-gen data infra. If youโre driven to accelerate human discovery and ready to change the world, apply now to join our galactic mission!
job-boards.greenhouse.io
Palo Alto, CA
11
26
372
Sounds like there's a lot of alpha in just hiring the best. I wonder if anyone knows a place that does that?
The guy in need of the job was telling me he made it to final interview rounds with multiple Fortune 500 tech companies and in each one, the final interviewer was not the would be manager, CEO, or people heโd work with, but a white HR lady in California.
6
1
91
At xAI, we are managing traffic at an unprecedented scale. Our team is small, dedicated, and highly skilled. In this role, you will own a critical part of our production serving infrastructure, collaborating closely with the research inference team to ensure it is elastic,
hi, @xai is hiring for k8s/traffic people on our supercomputing team -- you get to work with a lean and fast-moving team while shipping massive impact to many users worldwide. this is the most fun i've had at any company! please apply/DM and share. https://t.co/Mk4DvAzXrt
13
31
408
Many people wonder what is the benefit of training video gen models. Video gen by itself doesnโt necessarily seem to provide as much raw intelligence to users as modern LLMs. However, in the long term, video gen models will be used as neural simulations of the universe within
432
518
3K
We are actively hiring for multimodal understanding and generation. Join us to build the future AI interfaces! https://t.co/qzULWOZQEq
https://t.co/DKdeK9LEAt
https://t.co/z0j11Q98FA
job-boards.greenhouse.io
Palo Alto, CA; San Francisco, CA
139
369
2K
We are hiring on pretraining as well. If you are passionate about improving training efficiency, pretraining data quality and training infra. Please apply here: https://t.co/fKK8lGPd2P
https://t.co/3xVTBc5uOg
https://t.co/yJS5rmdQX6
job-boards.greenhouse.io
We are actively hiring for multimodal understanding and generation. Join us to build the future AI interfaces! https://t.co/qzULWOZQEq
https://t.co/DKdeK9LEAt
https://t.co/z0j11Q98FA
40
110
689