Han Cai Profile
Han Cai

@hancai_hm

Followers
507
Following
19
Media
6
Statuses
17

Research Scientist, NVIDIA

Cambridge, USA
Joined December 2019
Don't wanna be here? Send us removal request.
@hancai_hm
Han Cai
1 month
We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: ๐ŸŽฌ Supports video generation up to 2160ร—3840 (4K) resolution on a single H100 GPU โšก Delivers 14.8ร— faster inference than the base model while achieving comparable or
2
28
146
@hancai_hm
Han Cai
1 month
Changing the autoencoder in latent diffusion models is easier than you think. ๐Ÿš€ Introducing DC-Gen โ€“ a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with
5
38
223
@hancai_hm
Han Cai
1 month
Decoding is often the speed bottleneck in few-step latent diffusion models. ๐Ÿš€ Meet DC-AE-Lite: โšก 1.8ร— faster decoding than DC-AE ๐ŸŽฏ Similar reconstruction quality ๐Ÿ‘‰ Code: https://t.co/c4HhcdhZVV ๐Ÿ‘‰ Pre-trained model: https://t.co/0ktlnfQ7uG Contributors: Dongyun Zou,
0
4
16
@hancai_hm
Han Cai
1 month
๐Ÿš€ Jet-Nemotron โ€“ Code & pre-trained checkpoints now available! โšก๏ธ Achieve up to 53.6ร— higher generation throughput on H100 GPUs with cost-efficient finetuning. ๐Ÿ”— GitHub: https://t.co/XGX7MTMm7J ๐Ÿ”— Hugging Face: https://t.co/AMEGIq5zOp ๐Ÿ”— Paper:
Tweet card summary image
arxiv.org
We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation...
4
39
174
@HaochengXiUCB
Haocheng Xi
1 month
๐Ÿš€ Introducing Sparse VideoGen2 (SVG2) โ€” Pareto-frontier video generation acceleration with semantic-aware sparse attention! ๐Ÿ†Spotlight paper accepted by #NeurIPS2025 โœ… Training-free & plug-and-play โœ… Up to 2.5ร— faster on HunyuanVideo, 1.9ร— faster on Wan 2.1 โœ… SOTA quality
16
59
261
@hancai_hm
Han Cai
2 months
๐Ÿš€ Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. ๐Ÿ“ Catch us at
1
11
51
@hancai_hm
Han Cai
2 months
Developing new LLM architectures is both costly and risky. Our latest project โ€” https://t.co/WhOcpWBozX โ€” offers an effective strategy to address this challenge. Our first result is Jet-Nemotron, a new family of hybrid-architecture language models that outperform state-of-the-art
3
24
106
@baifeng_shi
Baifeng
3 months
We just dropped a few new PS3 models, with SOTA performance compared to existing vision encoders such as SigLIP2, C-RADIOv2, AIMv2, InternViT2.5, and Perception Encoder! Coming along with several new VILA-HD models. Check it out๐Ÿ‘‡ Models: https://t.co/UwjpBWpFBj Code:
4
16
85
@GT_HaoKang
Hao Kang
5 months
๐Ÿš€๐Ÿ“‰ A new kind of efficiency challenge: "Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs" We explore a new frontier: what if the reward doesnโ€™t come from being rightโ€”but from being fast and right? ๐Ÿ”— https://t.co/sxozRPpHJA ๐Ÿ›œ
3
14
61
@KMohan2006
Krishna Mohan
6 months
Deep compression in vae is hard, but this paper beautifully explains how to achieve this
6
48
455
@HaochengXiUCB
Haocheng Xi
6 months
๐Ÿš€ COAT: Memory Efficient FP8 Training @ICLR 2025 ๐Ÿ“ Hall 3 + Hall 2B Poster #566 ๐Ÿ—“ Sat, Apr 26 | 3:00โ€“5:30 PM Singapore Time โœ… 1.54x Memory Efficiency, 1.43x Speedup, near lossless performance! โœ… Check our poster about FP8 Training by Compressing Optimizer states and
0
16
71
@xieenze_jr
Enze Xie
7 months
๐Ÿš€ SANA 1.5 Update: Inference Scaling Now Open-Source! ๐ŸŽ‰ ๐Ÿ“ˆ Breakthrough on GenEval benchmark: โ€ข SANA 1.5 + Inference Scaling: 0.81 โ†’ 0.96 (!!) ๐ŸŽฏ โ€ข SD 1.5 + Inference Scaling: 0.42 โ†’ 0.87 โฌ†๏ธ ๐Ÿ’ซ The secret sauce: 1. Generate n candidates ๐ŸŽจ 2. Pick top k with NVILA
4
54
210
@baifeng_shi
Baifeng
7 months
Next-gen vision pre-trained models shouldnโ€™t be short-sighted. Humans can easily perceive 10K x 10K resolution. But todayโ€™s top vision modelsโ€”like SigLIP and DINOv2โ€”are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we
27
154
980
@hancai_hm
Han Cai
8 months
Achieving 2x faster video generation by exploiting spatial-temporal sparsity. It is training-free and open-source!
@HaochengXiUCB
Haocheng Xi
8 months
๐Ÿš€ Introducing #SparseVideoGen: 2x speedup in video generation with HunyuanVideo with high pixel-level fidelity (PSNR = 29)! No training is required, no perceptible difference to the human eye! Blog: https://t.co/3IXnz7PTaC Paper: https://t.co/oEHsk4lnaK Code:
0
0
1
@HaochengXiUCB
Haocheng Xi
9 months
๐Ÿš€ We're are excited to open source an FP8 training technique, COAT: Compressing Optimizer states and Activation for memory-efficient fp8 Training. COAT is accepted by ICLR 2025! FP8 training effectively improves the training efficiency. Deepseek-v3 is a successful example of
9
15
71
@xieenze_jr
Enze Xie
11 months
Hi everyone, I'm thrilled to announce that you can now try #SANA models in #ComfyUI๐ŸŽ‰. We show video generation using SANA+CogVideoX. SANA now also supports Chinese and Emoji prompts. If you find SANA useful, weโ€™d be grateful if you could give us a๐ŸŒŸat https://t.co/Yu57ikMEQT๐Ÿ’—
@xieenze_jr
Enze Xie
11 months
๐Ÿฅณโšก๏ธSANA's code is released, enjoy it and welcome starโญ๏ธ! SANA is an efficient linear DiT that can generate images from 1K to 4K ๐ŸŒ†๐ŸŽจ Code: https://t.co/wl9dfP5qIe Demo: https://t.co/wZdGw2hDwp Highlight: โฉ 20 x smaller & 100x faster than FLUX ๐Ÿ’ปDeployable on laptop GPU
1
5
49
@hancai_hm
Han Cai
1 year
๐Ÿฅณ We are excited to introduce Deep Compression Autoencoder. It dramatically reduces the token number of the latent space, delivering significant training and inference speedup for latent diffusion models. Paper: https://t.co/MiQpP5uFlD Code: https://t.co/qpQ57CjBSI
2
7
35