Song Han @songhan_mit X Profile

Song Han

@songhan_mit

Followers

9K

Following

154

Media

61

Statuses

274

Joined March 2019

Don't wanna be here? Send us removal request.

Song Han

@songhan_mit

8 hours

SANA-Video is open sourced:

github.com

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer - NVlabs/Sana

Enze Xie

@xieenze_jr

1 month

🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 Key Features 🌟 🧠 Linear DiT everywhere → O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache → store cumulative states only (no growing KV) 🔄 🎯 Temporal Mix-FFN + 3D RoPE

0

2

18

Song Han

@songhan_mit

12 days

Welcome to our efficient AI papers at ICCV:

0

6

101

HanRong YE

@leoyerrrr

12 days

OmniVinci is now #1 paper on Huggingface!!! 🤗 Building omni-modal LLMs is MORE than just mixing tokens 😉 At @NVIDIA, we explored deeper possibilities in building truly omni-modal systems — leading to OmniVinci-9B, which introduces three key innovations: - OmniAlignNet – a

11

27

146

NVIDIA AI Developer

@NVIDIAAIDev

15 days

🧠 At Open Source AI Week we can’t wait to learn how the community is using #opensource projects to redefine how AI is developed, scaled, and shared across text, image, audio, video, and multimodal tasks. To help accelerate innovation, we are now a top contributor on

2

9

77

Yukang Chen

@yukangchen_

18 days

Thanks AK for sharing. Code is available at

github.com

QeRL enables RL for 32B LLMs on a single H100 GPU. - NVlabs/QeRL

AK

@_akhaliq

18 days

QeRL Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

0

10

48

Song Han

@songhan_mit

16 days

explore the critical role of "attention sink" for both understanding and generation:

Yukang Chen

@yukangchen_

16 days

The Convergence of “Understanding × Generation” in Long Video — Attention Sink ✨🎬🧠 We recently open-sourced two works related to long videos: long-video understanding StreamingVLM ( https://t.co/o5MFULkjdR) and long-video generation LongLive ( https://t.co/OAFQSlnlbg). Both

0

2

16

Song Han

@songhan_mit

18 days

An interesting effect of 4bit RL is that the quantization noise helps exploration and increases the training reward:

Yukang Chen

@yukangchen_

18 days

We open-sourced QeRL — Quantization-enhanced Reinforcement Learning ! 🧠 4-bit quantized RL training 💪 Train a 32B LLM on a single H100 GPU ⚙️ 1.7× faster overall training 🎯 Accuracy on par with bfloat16-level accuracy 🔥 Supports NVFP4 quantization format Moreover, we show

0

7

44

Song Han

@songhan_mit

18 days

Explore StreamingVLM for understanding infinite video streams:

Guangxuan Xiao

@Guangxuan_Xiao

18 days

Excited to share our new work: StreamingVLM! 🚀 We tackle a major challenge for Vision-Language Models (VLMs): understanding infinite video streams in real-time without latency blowing up or running out of memory. Paper: https://t.co/G0bfwKCdZm Code: https://t.co/HqBoLMcrJF

2

7

44

Enze Xie

@xieenze_jr

1 month

Fast dLLM v2 7B is 3x faster than Qwen2.5-7B, achieving the same performance! 🚀 Report is available: https://t.co/7FjkYXKLYe

Enze Xie

@xieenze_jr

2 months

🚀 Fast-dLLM v2: Parallel Block-Diffusion Decoding for LLMs ⚡️ Highlights 🌟 - Blockwise bidirectional context via complementary masks - Hierarchical caches (block + sub-block) - Parallel sub-block decoding + token-shift training Results 📊 - ~2.5× faster vs. standard AR

0

9

25

Song Han

@songhan_mit

1 month

Explore Deep Compression Video Autoencoder, fast training and inference for video generation:

Han Cai

@hancai_hm

1 month

We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: 🎬 Supports video generation up to 2160×3840 (4K) resolution on a single H100 GPU ⚡ Delivers 14.8× faster inference than the base model while achieving comparable or

0

17

Han Cai

@hancai_hm

1 month

🚀 Jet-Nemotron – Code & pre-trained checkpoints now available! ⚡️ Achieve up to 53.6× higher generation throughput on H100 GPUs with cost-efficient finetuning. 🔗 GitHub: https://t.co/XGX7MTMm7J 🔗 Hugging Face: https://t.co/AMEGIq5zOp 🔗 Paper:

arxiv.org

We present Jet-Nemotron, a new family of hybrid-architecture language models, which matches or exceeds the accuracy of leading full-attention models while significantly improving generation...

4

39

174

Song Han

@songhan_mit

1 month

Explore Deep Compression Generation (DC-Gen), compress the number of tokens and accelerate FLUX by 53x:

Han Cai

@hancai_hm

1 month

Changing the autoencoder in latent diffusion models is easier than you think. 🚀 Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with

0

1

9

Song Han

@songhan_mit

1 month

explore our new work, SANA-Video, generating videos at low cost:

Enze Xie

@xieenze_jr

1 month

🚀 SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos 💥 Key Features 🌟 🧠 Linear DiT everywhere → O(N) complexity on video-scale tokens 🧰 Constant-memory Block KV cache → store cumulative states only (no growing KV) 🔄 🎯 Temporal Mix-FFN + 3D RoPE

1

26

Song Han

@songhan_mit

1 month

Explore LongLive for interactive, real-time long-video generation:

Yukang Chen

@yukangchen_

1 month

🚀 We open-sourced LongLive — interactive, real-time long-video generation. 👥Generates video in real time as users enter text prompts. ⚡️20.7 FPS on a single H100,⏱️up to 240s per clip. 🎬Fine-tunes SOTA short-video models (e.g., Wan) into long-video generators. 🌍One step

0

16

Song Han

@songhan_mit

1 month

Explore our second iteration on Sparse VideoGen: we don't need full-attention, but sparse attention. Unlike in v1 where we apply rule-based (spatial and temporal) sparsity pattern, in v2 we apply kmeans to cluster similar tokens together, formulate block sparsity patterns, then

Haocheng Xi

@HaochengXiUCB

1 month

🚀 Introducing Sparse VideoGen2 (SVG2) — Pareto-frontier video generation acceleration with semantic-aware sparse attention! 🏆Spotlight paper accepted by #NeurIPS2025 ✅ Training-free & plug-and-play ✅ Up to 2.5× faster on HunyuanVideo, 1.9× faster on Wan 2.1 ✅ SOTA quality

1

8

117

Song Han

@songhan_mit

2 months

Explore our second iteration of fast diffusion LLM research: - between blocks: autoregressive. - within a block: parallel decoding with KV cache. - post training technique that converts LLM to dLLM. - same accuracy but faster.

Enze Xie

@xieenze_jr

2 months

🚀 Fast-dLLM v2: Parallel Block-Diffusion Decoding for LLMs ⚡️ Highlights 🌟 - Blockwise bidirectional context via complementary masks - Hierarchical caches (block + sub-block) - Parallel sub-block decoding + token-shift training Results 📊 - ~2.5× faster vs. standard AR

1

14

113

Song Han

@songhan_mit

2 months

Explore Deep Compression Autoencoder (DC-AE) 1.5 with higher token compression ratio (64x) for faster visual generation:

Han Cai

@hancai_hm

2 months

🚀 Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. 📍 Catch us at

1

2

24

Song Han

@songhan_mit

2 months

Explore Jet-Nemotron, a small and fast LLM:

Han Cai

@hancai_hm

2 months

Developing new LLM architectures is both costly and risky. Our latest project — https://t.co/WhOcpWBozX — offers an effective strategy to address this challenge. Our first result is Jet-Nemotron, a new family of hybrid-architecture language models that outperform state-of-the-art

0

3

28