songhan_mit Profile Banner
Song Han Profile
Song Han

@songhan_mit

Followers
9K
Following
154
Media
61
Statuses
274

Joined March 2019
Don't wanna be here? Send us removal request.
@songhan_mit
Song Han
8 hours
SANA-Video is open sourced:
Tweet card summary image
github.com
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer - NVlabs/Sana
@xieenze_jr
Enze Xie
1 month
πŸš€ SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos πŸ’₯ Key Features 🌟 🧠 Linear DiT everywhere β†’ O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache β†’ store cumulative states only (no growing KV) πŸ”„ 🎯 Temporal Mix-FFN + 3D RoPE
0
2
18
@songhan_mit
Song Han
12 days
Welcome to our efficient AI papers at ICCV:
0
6
101
@leoyerrrr
HanRong YE
12 days
OmniVinci is now #1 paper on Huggingface!!! πŸ€— Building omni-modal LLMs is MORE than just mixing tokens πŸ˜‰ At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems β€” leading to OmniVinci-9B, which introduces three key innovations: - OmniAlignNet – a
11
27
146
@NVIDIAAIDev
NVIDIA AI Developer
15 days
🧠 At Open Source AI Week we can’t wait to learn how the community is using #opensource projects to redefine how AI is developed, scaled, and shared across text, image, audio, video, and multimodal tasks. To help accelerate innovation, we are now a top contributor on
2
9
77
@yukangchen_
Yukang Chen
18 days
Thanks AK for sharing. Code is available at
Tweet card summary image
github.com
QeRL enables RL for 32B LLMs on a single H100 GPU. - NVlabs/QeRL
@_akhaliq
AK
18 days
QeRL Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
0
10
48
@songhan_mit
Song Han
16 days
explore the critical role of "attention sink" for both understanding and generation:
@yukangchen_
Yukang Chen
16 days
The Convergence of β€œUnderstanding Γ— Generation” in Long Video β€” Attention Sink ✨🎬🧠 We recently open-sourced two works related to long videos: long-video understanding StreamingVLM ( https://t.co/o5MFULkjdR) and long-video generation LongLive ( https://t.co/OAFQSlnlbg). Both
0
2
16
@songhan_mit
Song Han
18 days
An interesting effect of 4bit RL is that the quantization noise helps exploration and increases the training reward:
@yukangchen_
Yukang Chen
18 days
We open-sourced QeRL β€” Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training πŸ’ͺ Train a 32B LLM on a single H100 GPU βš™οΈ 1.7Γ— faster overall training 🎯 Accuracy on par with bfloat16-level accuracy πŸ”₯ Supports NVFP4 quantization format Moreover, we show
0
7
44
@songhan_mit
Song Han
18 days
Explore StreamingVLM for understanding infinite video streams:
@Guangxuan_Xiao
Guangxuan Xiao
18 days
Excited to share our new work: StreamingVLM! πŸš€ We tackle a major challenge for Vision-Language Models (VLMs): understanding infinite video streams in real-time without latency blowing up or running out of memory. Paper: https://t.co/G0bfwKCdZm Code: https://t.co/HqBoLMcrJF
2
7
44
@xieenze_jr
Enze Xie
1 month
Fast dLLM v2 7B is 3x faster than Qwen2.5-7B, achieving the same performance! πŸš€ Report is available: https://t.co/7FjkYXKLYe
@xieenze_jr
Enze Xie
2 months
πŸš€ Fast-dLLM v2: Parallel Block-Diffusion Decoding for LLMs ⚑️ Highlights 🌟 - Blockwise bidirectional context via complementary masks - Hierarchical caches (block + sub-block) - Parallel sub-block decoding + token-shift training Results πŸ“Š - ~2.5Γ— faster vs. standard AR
0
9
25
@songhan_mit
Song Han
1 month
Explore Deep Compression Video Autoencoder, fast training and inference for video generation:
@hancai_hm
Han Cai
1 month
We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: 🎬 Supports video generation up to 2160Γ—3840 (4K) resolution on a single H100 GPU ⚑ Delivers 14.8Γ— faster inference than the base model while achieving comparable or
0
0
17
@hancai_hm
Han Cai
1 month
πŸš€ Jet-Nemotron – Code & pre-trained checkpoints now available! ⚑️ Achieve up to 53.6Γ— higher generation throughput on H100 GPUs with cost-efficient finetuning. πŸ”— GitHub: https://t.co/XGX7MTMm7J πŸ”— Hugging Face: https://t.co/AMEGIq5zOp πŸ”— Paper:
Tweet card summary image
arxiv.org
We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation...
4
39
174
@songhan_mit
Song Han
1 month
Explore Deep Compression Generation (DC-Gen), compress the number of tokens and accelerate FLUX by 53x:
@hancai_hm
Han Cai
1 month
Changing the autoencoder in latent diffusion models is easier than you think. πŸš€ Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with
0
1
9
@songhan_mit
Song Han
1 month
explore our new work, SANA-Video, generating videos at low cost:
@xieenze_jr
Enze Xie
1 month
πŸš€ SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos πŸ’₯ Key Features 🌟 🧠 Linear DiT everywhere β†’ O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache β†’ store cumulative states only (no growing KV) πŸ”„ 🎯 Temporal Mix-FFN + 3D RoPE
1
1
26
@songhan_mit
Song Han
1 month
Explore LongLive for interactive, real-time long-video generation:
@yukangchen_
Yukang Chen
1 month
πŸš€ We open-sourced LongLive β€” interactive, real-time long-video generation. πŸ‘₯Generates video in real time as users enter text prompts. ⚑️20.7 FPS on a single H100,⏱️up to 240s per clip. 🎬Fine-tunes SOTA short-video models (e.g., Wan) into long-video generators. 🌍One step
0
0
16
@songhan_mit
Song Han
1 month
Explore our second iteration on Sparse VideoGen: we don't need full-attention, but sparse attention. Unlike in v1 where we apply rule-based (spatial and temporal) sparsity pattern, in v2 we apply kmeans to cluster similar tokens together, formulate block sparsity patterns, then
@HaochengXiUCB
Haocheng Xi
1 month
πŸš€ Introducing Sparse VideoGen2 (SVG2) β€” Pareto-frontier video generation acceleration with semantic-aware sparse attention! πŸ†Spotlight paper accepted by #NeurIPS2025 βœ… Training-free & plug-and-play βœ… Up to 2.5Γ— faster on HunyuanVideo, 1.9Γ— faster on Wan 2.1 βœ… SOTA quality
1
8
117
@songhan_mit
Song Han
2 months
Explore our second iteration of fast diffusion LLM research: - between blocks: autoregressive. - within a block: parallel decoding with KV cache. - post training technique that converts LLM to dLLM. - same accuracy but faster.
@xieenze_jr
Enze Xie
2 months
πŸš€ Fast-dLLM v2: Parallel Block-Diffusion Decoding for LLMs ⚑️ Highlights 🌟 - Blockwise bidirectional context via complementary masks - Hierarchical caches (block + sub-block) - Parallel sub-block decoding + token-shift training Results πŸ“Š - ~2.5Γ— faster vs. standard AR
1
14
113
@songhan_mit
Song Han
2 months
Explore Deep Compression Autoencoder (DC-AE) 1.5 with higher token compression ratio (64x) for faster visual generation:
@hancai_hm
Han Cai
2 months
πŸš€ Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. πŸ“ Catch us at
1
2
24
@songhan_mit
Song Han
2 months
Explore Jet-Nemotron, a small and fast LLM:
@hancai_hm
Han Cai
2 months
Developing new LLM architectures is both costly and risky. Our latest project β€” https://t.co/WhOcpWBozX β€” offers an effective strategy to address this challenge. Our first result is Jet-Nemotron, a new family of hybrid-architecture language models that outperform state-of-the-art
0
3
28