Han Cai @hancai_hm X Profile

Han Cai

@hancai_hm

Followers

507

Following

19

Media

6

Statuses

17

Research Scientist, NVIDIA

https://t.co/Zk0fEeMSMr

Cambridge, USA

Joined December 2019

Don't wanna be here? Send us removal request.

Han Cai

@hancai_hm

1 month

We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: 🎬 Supports video generation up to 2160×3840 (4K) resolution on a single H100 GPU ⚡ Delivers 14.8× faster inference than the base model while achieving comparable or

2

28

146

Han Cai

@hancai_hm

1 month

Changing the autoencoder in latent diffusion models is easier than you think. 🚀 Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with

5

38

223

Han Cai

@hancai_hm

1 month

Decoding is often the speed bottleneck in few-step latent diffusion models. 🚀 Meet DC-AE-Lite: ⚡ 1.8× faster decoding than DC-AE 🎯 Similar reconstruction quality 👉 Code: https://t.co/c4HhcdhZVV 👉 Pre-trained model: https://t.co/0ktlnfQ7uG Contributors: Dongyun Zou,

0

4

16

Han Cai

@hancai_hm

1 month

🚀 Jet-Nemotron – Code & pre-trained checkpoints now available! ⚡️ Achieve up to 53.6× higher generation throughput on H100 GPUs with cost-efficient finetuning. 🔗 GitHub: https://t.co/XGX7MTMm7J 🔗 Hugging Face: https://t.co/AMEGIq5zOp 🔗 Paper:

arxiv.org

We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation...

4

39

174

Haocheng Xi

@HaochengXiUCB

1 month

🚀 Introducing Sparse VideoGen2 (SVG2) — Pareto-frontier video generation acceleration with semantic-aware sparse attention! 🏆Spotlight paper accepted by #NeurIPS2025 ✅ Training-free & plug-and-play ✅ Up to 2.5× faster on HunyuanVideo, 1.9× faster on Wan 2.1 ✅ SOTA quality

16

59

261

Han Cai

@hancai_hm

2 months

🚀 Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. 📍 Catch us at

1

11

51

Han Cai

@hancai_hm

2 months

Developing new LLM architectures is both costly and risky. Our latest project — https://t.co/WhOcpWBozX — offers an effective strategy to address this challenge. Our first result is Jet-Nemotron, a new family of hybrid-architecture language models that outperform state-of-the-art

3

24

106

Baifeng

@baifeng_shi

3 months

We just dropped a few new PS3 models, with SOTA performance compared to existing vision encoders such as SigLIP2, C-RADIOv2, AIMv2, InternViT2.5, and Perception Encoder! Coming along with several new VILA-HD models. Check it out👇 Models: https://t.co/UwjpBWpFBj Code:

4

16

85

Hao Kang

@GT_HaoKang

5 months

🚀📉 A new kind of efficiency challenge: "Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs" We explore a new frontier: what if the reward doesn’t come from being right—but from being fast and right? 🔗 https://t.co/sxozRPpHJA 🛜

3

14

61

Krishna Mohan

@KMohan2006

6 months

Deep compression in vae is hard, but this paper beautifully explains how to achieve this

6

48

455

Haocheng Xi

@HaochengXiUCB

6 months

🚀 COAT: Memory Efficient FP8 Training @ICLR 2025 📍 Hall 3 + Hall 2B Poster #566 🗓 Sat, Apr 26 | 3:00–5:30 PM Singapore Time ✅ 1.54x Memory Efficiency, 1.43x Speedup, near lossless performance! ✅ Check our poster about FP8 Training by Compressing Optimizer states and

0

16

71

Enze Xie

@xieenze_jr

7 months

🚀 SANA 1.5 Update: Inference Scaling Now Open-Source! 🎉 📈 Breakthrough on GenEval benchmark: • SANA 1.5 + Inference Scaling: 0.81 → 0.96 (!!) 🎯 • SD 1.5 + Inference Scaling: 0.42 → 0.87 ⬆️ 💫 The secret sauce: 1. Generate n candidates 🎨 2. Pick top k with NVILA

4

54

210

Baifeng

@baifeng_shi

7 months

Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we

27

154

980

Han Cai

@hancai_hm

8 months

Achieving 2x faster video generation by exploiting spatial-temporal sparsity. It is training-free and open-source!

Haocheng Xi

@HaochengXiUCB

8 months

🚀 Introducing #SparseVideoGen: 2x speedup in video generation with HunyuanVideo with high pixel-level fidelity (PSNR = 29)! No training is required, no perceptible difference to the human eye! Blog: https://t.co/3IXnz7PTaC Paper: https://t.co/oEHsk4lnaK Code:

0

1

Haocheng Xi

@HaochengXiUCB

9 months

🚀 We're are excited to open source an FP8 training technique, COAT: Compressing Optimizer states and Activation for memory-efficient fp8 Training. COAT is accepted by ICLR 2025! FP8 training effectively improves the training efficiency. Deepseek-v3 is a successful example of

9

15

71

Enze Xie

@xieenze_jr

11 months

Hi everyone, I'm thrilled to announce that you can now try #SANA models in #ComfyUI🎉. We show video generation using SANA+CogVideoX. SANA now also supports Chinese and Emoji prompts. If you find SANA useful, we’d be grateful if you could give us a🌟at https://t.co/Yu57ikMEQT💗

Enze Xie

@xieenze_jr

11 months

🥳⚡️SANA's code is released, enjoy it and welcome star⭐️! SANA is an efficient linear DiT that can generate images from 1K to 4K 🌆🎨 Code: https://t.co/wl9dfP5qIe Demo: https://t.co/wZdGw2hDwp Highlight: ⏩ 20 x smaller & 100x faster than FLUX 💻Deployable on laptop GPU

1

5

49

Han Cai

@hancai_hm

1 year

🥳 We are excited to introduce Deep Compression Autoencoder. It dramatically reduces the token number of the latent space, delivering significant training and inference speedup for latent diffusion models. Paper: https://t.co/MiQpP5uFlD Code: https://t.co/qpQ57CjBSI

2

7

35