Junsong_Chen
@lawrence_cjs
Followers
203
Following
79
Media
9
Statuses
53
HKU Ph.D, NVIDIA Research Internship
Hong Kong
Joined February 2022
We (@lawrence_cjs, @yuyangzhao_ , @shanasaimoe) from the SANA team just posted a blog on the core of Linear Attention: how it achieves infinite context lengths with global awareness but constant memory usage! We explore state accumulation mechanics, the evolution from Softmax to
4
34
179
How Linear Attention and Softmax Attention differ in compute and KV-Cache for LLMs and long-video generation. Let's start with this blog. https://t.co/Ja5El08muf
We (@lawrence_cjs, @yuyangzhao_ , @shanasaimoe) from the SANA team just posted a blog on the core of Linear Attention: how it achieves infinite context lengths with global awareness but constant memory usage! We explore state accumulation mechanics, the evolution from Softmax to
0
1
1
Sora 2 is amazing!, But AI video generation inference speed is too slow. Try our Deep Compression Autoencoder + Linear Attention! ππ₯ https://t.co/ooNowz8HH7
https://t.co/PU8oUI2hsU
github.com
DC-VideoGen: Efficient Video Generation with Deep Compression Video Autoencoder - dc-ai-projects/DC-VideoGen
π SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos π₯ Key Features π π§ Linear DiT everywhere β O(N) complexity on video-scale tokens π§° Constant-memory Block KV cache β store cumulative states only (no growing KV) π π― Temporal Mix-FFN + 3D RoPE
1
8
71
Thanks so much @_akhaliq for sharing our recent work. Our homepage is here:
0
0
1
Changing the autoencoder in latent diffusion models is easier than you think. π Introducing DC-Gen β a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with
5
38
222
We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: π¬ Supports video generation up to 2160Γ3840 (4K) resolution on a single H100 GPU β‘ Delivers 14.8Γ faster inference than the base model while achieving comparable or
2
28
145
π SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos π₯ Key Features π π§ Linear DiT everywhere β O(N) complexity on video-scale tokens π§° Constant-memory Block KV cache β store cumulative states only (no growing KV) π π― Temporal Mix-FFN + 3D RoPE
3
20
126
Finally, 36s for 5s 720p on H100; 4Γ speedup vs vanilla attention at 720p 29s on RTX 5090 with NVFP4 (2.4x faster) Fixed VRAM vs sequence length; strong textβvideo alignment
0
0
0
3. Temporal Mix-FFN+3D RoPE β local fidelity + temporal coherence π― 4. AR block training with Self rollout β minute-length generation π
1
0
0
2. Constant-Memory Block KV cacheβcumulative states only (no growing KV) π
1
0
0
Keysπ 1. Linear DiT everywhere β O(N) complexity on video-scale tokens
1
0
0
π SANA-Video: Linear Attention + Constant-Memory KV Cache = Fast Long Videos π₯ It's time for a new SANA family member! Links π π Paper: https://t.co/snV4bF8jUM π» Project Page: https://t.co/9WZIp7ryX6
1
1
3
Explore recent work from our team. Long-Live generates minute-length videos and interacts as you want with real-time fast speed! Very cool project. π
π We open-sourced LongLive β interactive, real-time long-video generation. π₯Generates video in real time as users enter text prompts. β‘οΈ20.7 FPS on a single H100,β±οΈup to 240s per clip. π¬Fine-tunes SOTA short-video models (e.g., Wan) into long-video generators. πOne step
0
0
1
Explore Deep Compression Autoencoder (DC-AE) 1.5 with higher token compression ratio (64x) for faster visual generation:
π Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. π Catch us at
1
2
24
The best few-step sampling model across the speed-memory frontier? π± Introducing SANA-Sprint in collaboration with the great SANA team! Beyond the results, perhaps more importantly, the work is about the recipe of SANA-Sprint. Code & model will be open β€οΈ Let's go β¬οΈ
12
26
162
SANA-Sprint One-Step Diffusion with Continuous-Time Consistency Distillation
10
66
425
Still think consistency models are bad at scale? In fact, sCM can be stably scaled to modern text-to-image diffusion models and greatly improve the generation speed and 1-step generation quality!
3
4
55
Excited for πSANA-Sprint. πCode and weights will be released very soon along with diffusers. Study tuned!β€οΈ
0
0
3
Introducing Sana-1.5. Model scaling up, then scaling down. Also inference time scaling is working as an auto end to end pipeline.
π₯ SANA 1.5: A linear Diffusion Transformer pushes SOTA in text-to-image generation! Key innovations: β’ Depth-growth training: 1.6B β 4.8B params β’ Memory-efficient 8-bit optimizer β’ Flexible model pruning β’ Inference scaling for better quality Achieves 0.80 on GenEval! π
0
0
2