songhan_mit Profile Banner
Song Han Profile
Song Han

@songhan_mit

Followers
9K
Following
155
Media
62
Statuses
279

Joined March 2019
Don't wanna be here? Send us removal request.
@songhan_mit
Song Han
10 hours
Kicking off the journey to NeurIPS! Our group’s papers focus on sparse attention, efficient video generation, small LLMs, and long-video understanding. We push efficiency to the limit and squeeze every last drop of potential out of GPUs.
1
3
38
@songhan_mit
Song Han
17 hours
Explore Sparse VideoGen2, our second iteration on sparse attention to accelerate video generation :
@HaochengXiUCB
Haocheng Xi ✈️ NeurIPS 2025
18 hours
πŸŽ‰ Come check out our Spotlight Poster @Neurips 2025! πŸš€ Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation πŸ“ Exhibit Hall C,D,E β€” #3508 πŸ—“οΈ Fri, Dec 5 | πŸ•“ 4:30–7:30 PM PST ⚑ Sparse VideoGen2 boosts video generation efficiency
0
2
23
@HaochengXiUCB
Haocheng Xi ✈️ NeurIPS 2025
18 hours
πŸŽ‰ Come check out our Spotlight Poster @Neurips 2025! πŸš€ Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation πŸ“ Exhibit Hall C,D,E β€” #3508 πŸ—“οΈ Fri, Dec 5 | πŸ•“ 4:30–7:30 PM PST ⚑ Sparse VideoGen2 boosts video generation efficiency
Tweet card summary image
arxiv.org
Diffusion Transformers (DiTs) are essential for video generation but suffer from significant latency due to the quadratic complexity of attention. By computing only critical tokens, sparse...
@HaochengXiUCB
Haocheng Xi ✈️ NeurIPS 2025
2 months
πŸš€ Introducing Sparse VideoGen2 (SVG2) β€” Pareto-frontier video generation acceleration with semantic-aware sparse attention! πŸ†Spotlight paper accepted by #NeurIPS2025 βœ… Training-free & plug-and-play βœ… Up to 2.5Γ— faster on HunyuanVideo, 1.9Γ— faster on Wan 2.1 βœ… SOTA quality
1
6
20
@xieenze_jr
Enze Xie
6 days
We (@lawrence_cjs, @yuyangzhao_ , @shanasaimoe) from the SANA team just posted a blog on the core of Linear Attention: how it achieves infinite context lengths with global awareness but constant memory usage! We explore state accumulation mechanics, the evolution from Softmax to
@xieenze_jr
Enze Xie
1 month
The training/ Inference code and checkpoints are released. Welcome to try!
4
34
179
@xieenze_jr
Enze Xie
26 days
πŸ₯³πŸŽ‰Sana-video inference code has been integrated into diffusers! Thanks to @lawrence_cjs @RisingSayak and the team for making it happen.
huggingface.co
@xieenze_jr
Enze Xie
1 month
The training/ Inference code and checkpoints are released. Welcome to try!
2
8
37
@songhan_mit
Song Han
1 month
SANA-Video is open sourced:
Tweet card summary image
github.com
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer - NVlabs/Sana
@xieenze_jr
Enze Xie
2 months
πŸš€ SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos πŸ’₯ Key Features 🌟 🧠 Linear DiT everywhere β†’ O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache β†’ store cumulative states only (no growing KV) πŸ”„ 🎯 Temporal Mix-FFN + 3D RoPE
3
7
73
@songhan_mit
Song Han
1 month
Welcome to our efficient AI papers at ICCV:
0
6
108
@leoyerrrr
HanRong YE
1 month
OmniVinci is now #1 paper on Huggingface!!! πŸ€— Building omni-modal LLMs is MORE than just mixing tokens πŸ˜‰ At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems β€” leading to OmniVinci-9B, which introduces three key innovations: - OmniAlignNet – a
11
27
150
@NVIDIAAIDev
NVIDIA AI Developer
2 months
🧠 At Open Source AI Week we can’t wait to learn how the community is using #opensource projects to redefine how AI is developed, scaled, and shared across text, image, audio, video, and multimodal tasks. To help accelerate innovation, we are now a top contributor on
2
9
76
@yukangchen_
Yukang Chen
2 months
Thanks AK for sharing. Code is available at
Tweet card summary image
github.com
QeRL enables RL for 32B LLMs on a single H100 GPU. - NVlabs/QeRL
@_akhaliq
AK
2 months
QeRL Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
0
10
50
@songhan_mit
Song Han
2 months
explore the critical role of "attention sink" for both understanding and generation:
@yukangchen_
Yukang Chen
2 months
The Convergence of β€œUnderstanding Γ— Generation” in Long Video β€” Attention Sink ✨🎬🧠 We recently open-sourced two works related to long videos: long-video understanding StreamingVLM ( https://t.co/o5MFULkjdR) and long-video generation LongLive ( https://t.co/OAFQSlnlbg). Both
0
2
16
@songhan_mit
Song Han
2 months
An interesting effect of 4bit RL is that the quantization noise helps exploration and increases the training reward:
@yukangchen_
Yukang Chen
2 months
We open-sourced QeRL β€” Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training πŸ’ͺ Train a 32B LLM on a single H100 GPU βš™οΈ 1.7Γ— faster overall training 🎯 Accuracy on par with bfloat16-level accuracy πŸ”₯ Supports NVFP4 quantization format Moreover, we show
0
7
45
@songhan_mit
Song Han
2 months
Explore StreamingVLM for understanding infinite video streams:
@Guangxuan_Xiao
Guangxuan Xiao
2 months
Excited to share our new work: StreamingVLM! πŸš€ We tackle a major challenge for Vision-Language Models (VLMs): understanding infinite video streams in real-time without latency blowing up or running out of memory. Paper: https://t.co/G0bfwKCdZm Code: https://t.co/HqBoLMcrJF
3
7
44
@xieenze_jr
Enze Xie
2 months
Fast dLLM v2 7B is 3x faster than Qwen2.5-7B, achieving the same performance! πŸš€ Report is available: https://t.co/7FjkYXKLYe
@xieenze_jr
Enze Xie
3 months
πŸš€ Fast-dLLM v2: Parallel Block-Diffusion Decoding for LLMs ⚑️ Highlights 🌟 - Blockwise bidirectional context via complementary masks - Hierarchical caches (block + sub-block) - Parallel sub-block decoding + token-shift training Results πŸ“Š - ~2.5Γ— faster vs. standard AR
0
9
26
@songhan_mit
Song Han
2 months
Explore Deep Compression Video Autoencoder, fast training and inference for video generation:
@hancai_hm
Han Cai
2 months
We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: 🎬 Supports video generation up to 2160Γ—3840 (4K) resolution on a single H100 GPU ⚑ Delivers 14.8Γ— faster inference than the base model while achieving comparable or
0
0
17
@hancai_hm
Han Cai
2 months
πŸš€ Jet-Nemotron – Code & pre-trained checkpoints now available! ⚑️ Achieve up to 53.6Γ— higher generation throughput on H100 GPUs with cost-efficient finetuning. πŸ”— GitHub: https://t.co/XGX7MTMm7J πŸ”— Hugging Face: https://t.co/AMEGIq5zOp πŸ”— Paper:
Tweet card summary image
huggingface.co
4
39
173
@songhan_mit
Song Han
2 months
Explore Deep Compression Generation (DC-Gen), compress the number of tokens and accelerate FLUX by 53x:
@hancai_hm
Han Cai
2 months
Changing the autoencoder in latent diffusion models is easier than you think. πŸš€ Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with
0
1
10
@songhan_mit
Song Han
2 months
explore our new work, SANA-Video, generating videos at low cost:
@xieenze_jr
Enze Xie
2 months
πŸš€ SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos πŸ’₯ Key Features 🌟 🧠 Linear DiT everywhere β†’ O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache β†’ store cumulative states only (no growing KV) πŸ”„ 🎯 Temporal Mix-FFN + 3D RoPE
1
1
26
@songhan_mit
Song Han
2 months
Explore LongLive for interactive, real-time long-video generation:
@yukangchen_
Yukang Chen
2 months
πŸš€ We open-sourced LongLive β€” interactive, real-time long-video generation. πŸ‘₯Generates video in real time as users enter text prompts. ⚑️20.7 FPS on a single H100,⏱️up to 240s per clip. 🎬Fine-tunes SOTA short-video models (e.g., Wan) into long-video generators. 🌍One step
0
0
16
@songhan_mit
Song Han
2 months
Explore our second iteration on Sparse VideoGen: we don't need full-attention, but sparse attention. Unlike in v1 where we apply rule-based (spatial and temporal) sparsity pattern, in v2 we apply kmeans to cluster similar tokens together, formulate block sparsity patterns, then
@HaochengXiUCB
Haocheng Xi ✈️ NeurIPS 2025
2 months
πŸš€ Introducing Sparse VideoGen2 (SVG2) β€” Pareto-frontier video generation acceleration with semantic-aware sparse attention! πŸ†Spotlight paper accepted by #NeurIPS2025 βœ… Training-free & plug-and-play βœ… Up to 2.5Γ— faster on HunyuanVideo, 1.9Γ— faster on Wan 2.1 βœ… SOTA quality
1
8
115