Hao AI Lab
@haoailab
Followers
4K
Following
641
Media
142
Statuses
392
Hao AI Lab at UCSD. Our mission is to democratize large machine learning models, algorithms, and their underlying systems.
Joined March 2024
🔥 New Blog: “Disaggregated Inference: 18 Months Later” 18 months in LLM inference feels like a new Moore’s Law cycle – but this time not just 2x per year: 💸 Serving cost ↓10–100x 🚀 Throughput ↑10x ⚡ Latency ↓5x A big reason? Disaggregated Inference. From DistServe, our
hao-ai-lab.github.io
Eighteen months ago, our lab introduced DistServe with a simple bet: split LLM inference into prefill and decode, and scale them independently on separate compute pools. Today, almost every product...
5
47
170
Exciting time to work on parallel generation in our lab: strong quality and faster generation, fully leveraging modern hardware. 🚀
Excited to partner with SGLang: FastVideo + SGLang = the future open ecosystem for diffusion. 🥳🫡 ----------- A few extra cents: Since I started faculty at UCSD, our lab has been investing diffusion for video and text , and in both algorithms and systems. - Text-side, we
0
1
6
Hao has been pioneering efficient architecture research for many years. Always eager to see the innovations from him and his group!
Excited to partner with SGLang: FastVideo + SGLang = the future open ecosystem for diffusion. 🥳🫡 ----------- A few extra cents: Since I started faculty at UCSD, our lab has been investing diffusion for video and text , and in both algorithms and systems. - Text-side, we
0
3
83
Excited to partner with SGLang: FastVideo + SGLang = the future open ecosystem for diffusion. 🥳🫡 ----------- A few extra cents: Since I started faculty at UCSD, our lab has been investing diffusion for video and text , and in both algorithms and systems. - Text-side, we
hao-ai-lab.github.io
TL;DR: LLMs have been traditionally regarded as sequential decoders, decoding one token after another. In this blog, we show pretrained LLMs can be easily taught to operate as efficient parallel...
Exciting to partner with SGL (@lmsysorg ). FastVideo + SGL = the future open source ecosystem for diffusion! 🥳🥳
3
8
100
Exciting to partner with SGL (@lmsysorg ). FastVideo + SGL = the future open source ecosystem for diffusion! 🥳🥳
🚀 Introducing SGLang Diffusion — bringing SGLang’s high-performance serving to diffusion models. ⚡️ Up to 5.9× faster inference 🧩 Supports major open-source models: Wan, Hunyuan, Qwen-Image, Qwen-Image-Edit, Flux 🧰 Easy to use via OpenAI-compatible API, CLI & Python API
1
6
25
♠️♥️ Day 3 — Final Showdown! Our last day of the LLM Texas Hold’em tournament is live 🎥 📊 Current TrueSkill 2 Top 3: Grok-4-0709 > Gemini-2.5-Pro >GPT-5 (2025-08-07) Same prompt every day — around 20 hands/day, we will provide final TrueSkill2 ranking after today’s games!
0
2
8
[Lmgame Bench] Day 2 Recap ♠️♥️ Chip Standings + Rank Changes 🎲 Each day includes ~20 rounds, so rank shifts may reflect short-term variance rather than stable strategy change. Final TrueSkill2 after full 60 rounds will tell more. 📊Ranks 1️⃣ Gemini-2.5-Pro 359 ⬆️ (+5) 2️⃣
1
3
8
♠️♥️ Texas Hold’em LLM tournament Day 2 is live! 🆕 New layout: each model’s thought now shown on the right side. Here’s Day 1 chip results 🪙 — final TrueSkill2 rankings will be posted after the tournament ends. 1️⃣ GPT-5 — 336 2️⃣ Grok-4 — 305 3️⃣ Kimi-K2 — 304 4️⃣
0
3
12
♠️♥️ The cards are on the table. Day 1 of our 3-day Texas Hold’em LLM tournament is live! 😍 🤖 6 models. 300 chips each. No strategy prompts, only pure reasoning. 🎥 Watch now → https://t.co/5WJ8iVVEHz
#AI #TexasHoldem #LmgameBench
5
8
20
🔗 Explore more 📺 Watch live → https://t.co/5WJ8iVV6S1 📊 Leaderboard → https://t.co/wEc803fsbB 🕹️ Try it yourself → https://t.co/OMJUHsUWSK 📄 Blog → https://t.co/aG2Gpl5VXX 💬 Join us →
discord.com
Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.
0
0
3
📊 Tournament Format • Round-robin matches across 3 days • 300 chips reset daily • TrueSkill2 ranking system • Fold/call/raise ratios → style profiling We’ll see how different LLMs behave when only the game rules guide their reasoning — no hidden heuristics, just raw
1
0
3
🤖 Setup We adapt the PettingZoo Texas Hold’em environment into our evaluation harness. Each model sees the hole cards, board state, stacks, pot & legal actions → chooses from {fold, call, raise, check}. 🎯 Models competing: • GPT-5 (2025-08-07) • DeepSeek-V3-2-Exp •
1
0
3
[Lmgame Bench] ♠️♥️ Can LLMs bluff, fold, and bet like real poker players—with no strategic help? From Oct 28 – 30 (Tue–Thu, 10 AM – 4 PM PT), we’re hosting a 6 model live multi-agent Texas Hold’em tournament on Twitch 🎥 🕹️ https://t.co/5WJ8iVVEHz Each model starts with 300
1
5
12
Strongly disagree with the original post, and agree with that Berkeley, Stanford, and UCSD actually do offer many good courses that are cutting edge and timely. For example, this Winter I offered this machine learning systems course https://t.co/mlhUais8wk at UCSD (all materials
At @Berkeley_EECS we always work to keep our curriculum fresh. Our intro ML course CS 189 just got a drastic makeover this semester (thanks @profjoeyg @NargesNorouzi!) and now includes ~12 lectures on e.g. Adam, PyTorch, various NN architectures, LLMs, and more (see
18
95
1K
🚀 vLLM just hit 60K GitHub stars! 🎉 From a small research idea to powering LLM inference everywhere — across NVIDIA, AMD, Intel, Apple, TPUs, and more — vLLM now supports almost all major text-generation models and native RL pipelines like TRL, Unsloth, Verl, and OpenRLHF.
11
49
491
Tunix × GRL: One-Line Multi-Turn RL on JAX+TPU 📷 We’re collaborating closely with Google’s Tunix team—JAX-native LLM post-training on TPU. Using Tunix’s lightweight RL framework, we shipped a first-hand multi-turn RL training example in GRL. It runs in one line. GRL:
github.com
Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning - lmgame-org/GRL
1
9
31
Check out our paper / code / blog! 📄 Paper: https://t.co/wY9b7bMhUP 🔗 Code: https://t.co/DxwL4wQp6r 📖 Blog:
hao-ai-lab.github.io
TL;DR: We observe reasoning models often exhibit poor token efficiency: they waste many tokens second-guessing themselves. We develop Dynasor-CoT, a certainty-based approach for dynamically allocat...
0
1
5
Heartfelt gratitude to @nvidia TensorRT team member and our incredible @haoailab team members @Junda_Chen_ @FuYichao123 @fuzheyu2, @Humaira__18 @zhongdongm79676 @XuJerry15689. Your dedication made this milestone possible!
1
0
2
🚀 🚀 Dynasor is featured in @NVIDIA TensorRT-LLM new inference-time compute framework Scaffolding! Dynasor help cuts token usage by up to 29% with no accuracy loss! 🔍 NV Blog: https://t.co/S06S7dr4T4 Dynasor also Just accepted at #NeurIPS2025!
github.com
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor...
1
6
25
[6/N]🔥Takeaway: Token-level SD was just the beginning. Step-level Lookahead Reasoning opens the door to even faster, scalable, and powerful LLM reasoning. 👉 Blog: https://t.co/MOQh96i0mX 👉 Code: https://t.co/vfmIlpoq9P 👉 Paper:
arxiv.org
Reasoning models excel by generating long chain-of-thoughts, but decoding the resulting thousands of tokens is slow. Token-level speculative decoding (SD) helps, but its benefit is capped, because...
0
1
6