Qinsheng Zhang Profile
Qinsheng Zhang

@qsh_zh

Followers
484
Following
618
Media
23
Statuses
130

Research Scientist in nvidia, currently building/debugging world models Previously Ph.D. @GeorgiaTech; bachelor @sjtu1896

Joined November 2017
Don't wanna be here? Send us removal request.
@_akhaliq
AK
20 days
World Simulation with Video Foundation Models for Physical AI
8
71
385
@qsh_zh
Qinsheng Zhang
19 days
@zkwthu shout out / built on top of @clu_cheng @DrYangSong fantastic sCM, @coolboywzy @clu_ch VSD, @TianweiY DMD
0
0
5
@qsh_zh
Qinsheng Zhang
19 days
Checking out our latest work in diffusion distillation. @zkwthu 's 6 month distillation experiments findings and our clean training code are OPEN now.
@zkwthu
Kaiwen Zheng
2 months
🚀Try out rCM—the most advanced diffusion distillation! ✅First to scale up sCM/MeanFlow to 10B+ video models ✅Open-sourced FlashAttention-2 JVP kernel & FSDP/CP support ✅High quality & diversity videos in 2~4 steps Paper: https://t.co/xZZK25oIrJ Code: https://t.co/aPAo1MO0JQ
1
3
19
@qsh_zh
Qinsheng Zhang
1 month
ok, another work https://t.co/NI3sCmgero from @JiachenLei in pixel space lol @emiel_hoogeboom SID2
Tweet card summary image
arxiv.org
Latent diffusion models have become the popular choice for scaling up diffusion models for high resolution image synthesis. Compared to pixel-space models that are trained end-to-end, latent...
@JiachenLei
Jiachen Lei
1 month
No more reliance on VAE or DINO! Similar to the motivation of RAE by @sainingxie's team We propose EPG: SSL pre-training + end-to-end FT = SOTA FID on IN256/512! Works nicely for both DM and CM https://t.co/a2q4hKcBwB (1/n)🧵 Next: Why training DMs on raw pixels is difficult?
0
0
2
@qsh_zh
Qinsheng Zhang
1 month
Interesting result! But wait a second — is this the DALL·E 2 / unCLIP version on ImageNet? 😂 Back in 2022, Imagen got all the attention right after the DALL·E 2 release (three months?). Maybe directly pixel space approach will back! maybe already here https://t.co/HSjGmXRFYM
@sainingxie
Saining Xie
1 month
three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)
2
0
16
@qsh_zh
Qinsheng Zhang
2 months
Fun discovery: #NanoBanana can correct typo artifacts in images generated by #Hunyuan3, but struggles to fix its own samples. NanoBanana can still detect all artifacts in both images and make correct edit suggestions.
0
0
6
@zkwthu
Kaiwen Zheng
2 months
1/ Excited to share the latest work with @chenhuay17! We propose DiffusionNFT, a new online diffusion RL paradigm that optimizes directly on the forward diffusion process. Paper: https://t.co/oacDQZua6I Code: https://t.co/4UBx26TOyz
3
13
35
@qsh_zh
Qinsheng Zhang
3 months
Playing with Google’s new image model and comparing it with GPT-4o. It looks like GPT-4o has difficulty retaining original image details—I guess it only uses a semantic image encoder? BTW, GPT’s safety checks are much more rigorous than Google’s😅
0
1
6
@qsh_zh
Qinsheng Zhang
5 months
@AliHassaniJr for 2️⃣ @ostrisai has some interesting findings on the importance of different dit blocks. https://t.co/RggoskgPcH However, I was a bit surprised when Ali showed that the model struggled to compensate after sparsifying certain layers and training for ~30k h100 hours
Tweet card summary image
ostris.com
Skipping blocks in FLUX.1-dev. Some blocks can be skipped without affecting the output much. Others will destroy the output if omitted.
0
0
1
@qsh_zh
Qinsheng Zhang
5 months
the speed up is over existing fatest attention impls ( FA3 / cudnn) Lessens learned from collaborating with @AliHassaniJr 1️⃣ FLOP cuts don’t equal speed gains. 1. Despite many academic works focusing on FLOPs reduction, bringing reduced FLOPs to real end-to-end speedups is
@AliHassaniJr
Ali Hassani
5 months
Cosmos-Predict2 meets NATTEN. We just released variants of Cosmos-Predict2 where we replace most self attentions with neighborhood attention, bringing up to 2.6X end-to-end speedup, with minimal effect on quality! https://t.co/S8qHyfcCTS (1/5)
1
1
6
@qsh_zh
Qinsheng Zhang
5 months
Phenomenal
@_kevinlu
Kevin Lu
5 months
Why you should stop working on RL research and instead work on product // The technology that unlocked the big scaling shift in AI is the internet, not transformers I think it's well known that data is the most important thing in AI, and also that researchers choose not to work
0
0
2
@qsh_zh
Qinsheng Zhang
5 months
Check out DreamGen 🥳, powered by Cosmos-Predict2. Post-training for DreamGen Bench
Tweet card summary image
github.com
Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications. - nvidia-cosmos/cosmos-pr...
@jang_yoel
Joel Jang
6 months
Introducing 𝐃𝐫𝐞𝐚𝐦𝐆𝐞𝐧! We got humanoid robots to perform totally new 𝑣𝑒𝑟𝑏𝑠 in new environments through video world models. We believe video world models will solve the data problem in robotics. Bringing the paradigm of scaling human hours to GPU hours. Quick 🧵
0
0
4
@qsh_zh
Qinsheng Zhang
5 months
nice hands-on demo video!
@hanna_mao
Hanzi Mao
5 months
We build Cosmos-Predict2 as a world foundation model for Physical AI builders — fully open and adaptable. Post-train it for specialized tasks or different output types. Available in multiple sizes, resolutions, and frame rates. 📷 Watch the repo walkthrough
0
0
1
@mli0603
Max Li 李赵硕
6 months
You can now pair up Cosmos-Reason1 with Cosmos-Predict2 to generate different possible rollouts and pick the best one. Episodic future thinking from world foundation models through test time scaling! Cosmos-Predict2: https://t.co/1mX3XpGQKz (2/n)
1
2
7
@mli0603
Max Li 李赵硕
6 months
Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: https://t.co/TcqqvrhqAD Huggingface: https://t.co/hOLno2IyhW Code: https://t.co/UUg90bmcGW Project page: https://t.co/Dr6ZqnKM8o (1/n)
2
32
101
@HuanLing6
Huan Ling
6 months
We are excited to share Cosmos-Drive-Dreams 🚀 A bold new synthetic data generation (SDG) pipeline powered by world foundation models—designed to synthesize rich, challenging driving scenarios at scale. Models, Code, Dataset, Tookit are released. Website:
11
44
107
@prithvijitch
Prithvijit
6 months
The WorldModelBench workshop is happening tomorrow (June 12th) at #CVPR2025! We have an exciting series of talks, do attend! Place: Room 108 Time: Morning Session #NVIDIAResearch
@prithvijitch
Prithvijit
9 months
Join us at the WorldModelBench workshop at #CVPR2025 where we'll tackle systematic evaluation of World Models! Focus: benchmarks, metrics, downstream tasks, and safety. Submit papers now:
1
10
19
@qsh_zh
Qinsheng Zhang
6 months
many core-contributors are attending #CVPR2025 . Let’s discuss the future of world models!
@qsh_zh
Qinsheng Zhang
6 months
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly
0
0
5
@qsh_zh
Qinsheng Zhang
6 months
🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly
7
62
204