Hao AI Lab @haoailab X Profile

Hao AI Lab

@haoailab

Followers

4K

Following

641

Media

142

Statuses

392

Hao AI Lab at UCSD. Our mission is to democratize large machine learning models, algorithms, and their underlying systems.

https://t.co/18QoMAmntJ

Joined March 2024

Don't wanna be here? Send us removal request.

Hao AI Lab

@haoailab

8 days

🔥 New Blog: “Disaggregated Inference: 18 Months Later” 18 months in LLM inference feels like a new Moore’s Law cycle – but this time not just 2x per year: 💸 Serving cost ↓10–100x 🚀 Throughput ↑10x ⚡ Latency ↓5x A big reason? Disaggregated Inference. From DistServe, our

hao-ai-lab.github.io

Eighteen months ago, our lab introduced DistServe with a simple bet: split LLM inference into prefill and decode, and scale them independently on separate compute pools. Today, almost every product...

5

47

170

Lanxiang Hu

@Lanxiang_Hu

4 days

Exciting time to work on parallel generation in our lab: strong quality and faster generation, fully leveraging modern hardware. 🚀

Hao Zhang

@haozhangml

4 days

Excited to partner with SGLang: FastVideo + SGLang = the future open ecosystem for diffusion. 🥳🫡 ----------- A few extra cents: Since I started faculty at UCSD, our lab has been investing diffusion for video and text , and in both algorithms and systems. - Text-side, we

0

1

6

Lianmin Zheng

@lm_zheng

4 days

Hao has been pioneering efficient architecture research for many years. Always eager to see the innovations from him and his group!

Hao Zhang

@haozhangml

4 days

Excited to partner with SGLang: FastVideo + SGLang = the future open ecosystem for diffusion. 🥳🫡 ----------- A few extra cents: Since I started faculty at UCSD, our lab has been investing diffusion for video and text , and in both algorithms and systems. - Text-side, we

0

3

83

Hao Zhang

@haozhangml

4 days

Excited to partner with SGLang: FastVideo + SGLang = the future open ecosystem for diffusion. 🥳🫡 ----------- A few extra cents: Since I started faculty at UCSD, our lab has been investing diffusion for video and text , and in both algorithms and systems. - Text-side, we

hao-ai-lab.github.io

TL;DR: LLMs have been traditionally regarded as sequential decoders, decoding one token after another. In this blog, we show pretrained LLMs can be easily taught to operate as efficient parallel...

Hao AI Lab

@haoailab

4 days

Exciting to partner with SGL (@lmsysorg ). FastVideo + SGL = the future open source ecosystem for diffusion! 🥳🥳

3

8

100

Hao AI Lab

@haoailab

4 days

Exciting to partner with SGL (@lmsysorg ). FastVideo + SGL = the future open source ecosystem for diffusion! 🥳🥳

LMSYS Org

@lmsysorg

4 days

🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API

1

6

25

Hao AI Lab

@haoailab

12 days

♠️♥️ Day 3 — Final Showdown! Our last day of the LLM Texas Hold’em tournament is live 🎥 📊 Current TrueSkill 2 Top 3: Grok-4-0709 > Gemini-2.5-Pro >GPT-5 (2025-08-07) Same prompt every day — around 20 hands/day, we will provide final TrueSkill2 ranking after today’s games!

0

2

8

Hao AI Lab

@haoailab

13 days

[Lmgame Bench] Day 2 Recap ♠️♥️ Chip Standings + Rank Changes 🎲 Each day includes ~20 rounds, so rank shifts may reflect short-term variance rather than stable strategy change. Final TrueSkill2 after full 60 rounds will tell more. 📊Ranks 1️⃣ Gemini-2.5-Pro 359 ⬆️ (+5) 2️⃣

1

3

8

Hao AI Lab

@haoailab

14 days

♠️♥️ Texas Hold’em LLM tournament Day 2 is live! 🆕 New layout: each model’s thought now shown on the right side. Here’s Day 1 chip results 🪙 — final TrueSkill2 rankings will be posted after the tournament ends. 1️⃣ GPT-5 — 336 2️⃣ Grok-4 — 305 3️⃣ Kimi-K2 — 304 4️⃣

0

3

12

Hao AI Lab

@haoailab

15 days

♠️♥️ The cards are on the table. Day 1 of our 3-day Texas Hold’em LLM tournament is live! 😍 🤖 6 models. 300 chips each. No strategy prompts, only pure reasoning. 🎥 Watch now → https://t.co/5WJ8iVVEHz #AI #TexasHoldem #LmgameBench

5

8

20

Hao AI Lab

@haoailab

17 days

🔗 Explore more 📺 Watch live → https://t.co/5WJ8iVV6S1 📊 Leaderboard → https://t.co/wEc803fsbB 🕹️ Try it yourself → https://t.co/OMJUHsUWSK 📄 Blog → https://t.co/aG2Gpl5VXX 💬 Join us →

discord.com

Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

0

3

Hao AI Lab

@haoailab

17 days

📊 Tournament Format • Round-robin matches across 3 days • 300 chips reset daily • TrueSkill2 ranking system • Fold/call/raise ratios → style profiling We’ll see how different LLMs behave when only the game rules guide their reasoning — no hidden heuristics, just raw

1

0

3

Hao AI Lab

@haoailab

17 days

🤖 Setup We adapt the PettingZoo Texas Hold’em environment into our evaluation harness. Each model sees the hole cards, board state, stacks, pot & legal actions → chooses from {fold, call, raise, check}. 🎯 Models competing: • GPT-5 (2025-08-07) • DeepSeek-V3-2-Exp •

1

0

3

Hao AI Lab

@haoailab

17 days

[Lmgame Bench] ♠️♥️ Can LLMs bluff, fold, and bet like real poker players—with no strategic help? From Oct 28 – 30 (Tue–Thu, 10 AM – 4 PM PT), we’re hosting a 6 model live multi-agent Texas Hold’em tournament on Twitch 🎥 🕹️ https://t.co/5WJ8iVVEHz Each model starts with 300

1

5

12

Hao Zhang

@haozhangml

25 days

Strongly disagree with the original post, and agree with that Berkeley, Stanford, and UCSD actually do offer many good courses that are cutting edge and timely. For example, this Winter I offered this machine learning systems course https://t.co/mlhUais8wk at UCSD (all materials

Jelani Nelson

@minilek

26 days

At @Berkeley_EECS we always work to keep our curriculum fresh. Our intro ML course CS 189 just got a drastic makeover this semester (thanks @profjoeyg @NargesNorouzi!) and now includes ~12 lectures on e.g. Adam, PyTorch, various NN architectures, LLMs, and more (see

18

95

1K

vLLM

@vllm_project

30 days

🚀 vLLM just hit 60K GitHub stars! 🎉 From a small research idea to powering LLM inference everywhere — across NVIDIA, AMD, Intel, Apple, TPUs, and more — vLLM now supports almost all major text-generation models and native RL pipelines like TRL, Unsloth, Verl, and OpenRLHF.

11

49

491

Hao AI Lab

@haoailab

1 month

Tunix × GRL: One-Line Multi-Turn RL on JAX+TPU 📷 We’re collaborating closely with Google’s Tunix team—JAX-native LLM post-training on TPU. Using Tunix’s lightweight RL framework, we shipped a first-hand multi-turn RL training example in GRL. It runs in one line. GRL:

github.com

Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning - lmgame-org/GRL

1

9

31

Hao AI Lab

@haoailab

2 months

Check out our paper / code / blog! 📄 Paper: https://t.co/wY9b7bMhUP 🔗 Code: https://t.co/DxwL4wQp6r 📖 Blog:

hao-ai-lab.github.io

TL;DR: We observe reasoning models often exhibit poor token efficiency: they waste many tokens second-guessing themselves. We develop Dynasor-CoT, a certainty-based approach for dynamically allocat...

0

1

5

Hao AI Lab

@haoailab

2 months

Heartfelt gratitude to @nvidia TensorRT team member and our incredible @haoailab team members @Junda_Chen_ @FuYichao123 @fuzheyu2, @Humaira__18 @zhongdongm79676 @XuJerry15689. Your dedication made this milestone possible!

1

0

2

Hao AI Lab

@haoailab

2 months

🚀 🚀 Dynasor is featured in @NVIDIA TensorRT-LLM new inference-time compute framework Scaffolding! Dynasor help cuts token usage by up to 29% with no accuracy loss! 🔍 NV Blog: https://t.co/S06S7dr4T4 Dynasor also Just accepted at #NeurIPS2025!

github.com

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor...

1

6

25

Hao AI Lab

@haoailab

2 months

[6/N]🔥Takeaway: Token-level SD was just the beginning. Step-level Lookahead Reasoning opens the door to even faster, scalable, and powerful LLM reasoning. 👉 Blog: https://t.co/MOQh96i0mX 👉 Code: https://t.co/vfmIlpoq9P 👉 Paper:

arxiv.org

Reasoning models excel by generating long chain-of-thoughts, but decoding the resulting thousands of tokens is slow. Token-level speculative decoding (SD) helps, but its benefit is capped, because...

0

1

6