Qinsheng Zhang @qsh_zh X Profile

Qinsheng Zhang

@qsh_zh

Followers

484

Following

618

Media

23

Statuses

130

Research Scientist in nvidia, currently building/debugging world models Previously Ph.D. @GeorgiaTech; bachelor @sjtu1896

https://t.co/0jBZIORGEJ

Joined November 2017

Don't wanna be here? Send us removal request.

AK

@_akhaliq

20 days

World Simulation with Video Foundation Models for Physical AI

8

71

385

Qinsheng Zhang

@qsh_zh

19 days

@zkwthu shout out / built on top of @clu_cheng @DrYangSong fantastic sCM, @coolboywzy @clu_ch VSD, @TianweiY DMD

0

5

Qinsheng Zhang

@qsh_zh

19 days

Checking out our latest work in diffusion distillation. @zkwthu 's 6 month distillation experiments findings and our clean training code are OPEN now.

Kaiwen Zheng

@zkwthu

2 months

🚀Try out rCM—the most advanced diffusion distillation! ✅First to scale up sCM/MeanFlow to 10B+ video models ✅Open-sourced FlashAttention-2 JVP kernel & FSDP/CP support ✅High quality & diversity videos in 2~4 steps Paper: https://t.co/xZZK25oIrJ Code: https://t.co/aPAo1MO0JQ

1

3

19

Qinsheng Zhang

@qsh_zh

1 month

ok, another work https://t.co/NI3sCmgero from @JiachenLei in pixel space lol @emiel_hoogeboom SID2

arxiv.org

Latent diffusion models have become the popular choice for scaling up diffusion models for high resolution image synthesis. Compared to pixel-space models that are trained end-to-end, latent...

Jiachen Lei

@JiachenLei

1 month

No more reliance on VAE or DINO! Similar to the motivation of RAE by @sainingxie's team We propose EPG: SSL pre-training + end-to-end FT = SOTA FID on IN256/512! Works nicely for both DM and CM https://t.co/a2q4hKcBwB (1/n)🧵 Next: Why training DMs on raw pixels is difficult?

0

2

Qinsheng Zhang

@qsh_zh

1 month

Interesting result! But wait a second — is this the DALL·E 2 / unCLIP version on ImageNet? 😂 Back in 2022, Imagen got all the attention right after the DALL·E 2 release (three months?). Maybe directly pixel space approach will back! maybe already here https://t.co/HSjGmXRFYM

Saining Xie

@sainingxie

1 month

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

2

0

16

Qinsheng Zhang

@qsh_zh

2 months

Fun discovery: #NanoBanana can correct typo artifacts in images generated by #Hunyuan3, but struggles to fix its own samples. NanoBanana can still detect all artifacts in both images and make correct edit suggestions.

0

6

Kaiwen Zheng

@zkwthu

2 months

1/ Excited to share the latest work with @chenhuay17! We propose DiffusionNFT, a new online diffusion RL paradigm that optimizes directly on the forward diffusion process. Paper: https://t.co/oacDQZua6I Code: https://t.co/4UBx26TOyz

3

13

35

Qinsheng Zhang

@qsh_zh

3 months

Playing with Google’s new image model and comparing it with GPT-4o. It looks like GPT-4o has difficulty retaining original image details—I guess it only uses a semantic image encoder? BTW, GPT’s safety checks are much more rigorous than Google’s😅

0

1

6

Qinsheng Zhang

@qsh_zh

5 months

@AliHassaniJr for 2️⃣ @ostrisai has some interesting findings on the importance of different dit blocks. https://t.co/RggoskgPcH However, I was a bit surprised when Ali showed that the model struggled to compensate after sparsifying certain layers and training for ~30k h100 hours

ostris.com

Skipping blocks in FLUX.1-dev. Some blocks can be skipped without affecting the output much. Others will destroy the output if omitted.

0

1

Qinsheng Zhang

@qsh_zh

5 months

the speed up is over existing fatest attention impls ( FA3 / cudnn) Lessens learned from collaborating with @AliHassaniJr 1️⃣ FLOP cuts don’t equal speed gains. 1. Despite many academic works focusing on FLOPs reduction, bringing reduced FLOPs to real end-to-end speedups is

Ali Hassani

@AliHassaniJr

5 months

Cosmos-Predict2 meets NATTEN. We just released variants of Cosmos-Predict2 where we replace most self attentions with neighborhood attention, bringing up to 2.6X end-to-end speedup, with minimal effect on quality! https://t.co/S8qHyfcCTS (1/5)

1

6

Qinsheng Zhang

@qsh_zh

5 months

Phenomenal

Kevin Lu

@_kevinlu

5 months

Why you should stop working on RL research and instead work on product // The technology that unlocked the big scaling shift in AI is the internet, not transformers I think it's well known that data is the most important thing in AI, and also that researchers choose not to work

0

2

Qinsheng Zhang

@qsh_zh

5 months

Check out DreamGen 🥳, powered by Cosmos-Predict2. Post-training for DreamGen Bench

github.com

Cosmos-Predict2 is a collection of general-purpose world foundation models for Physical AI that can be fine-tuned into customized world models for downstream applications. - nvidia-cosmos/cosmos-pr...

Joel Jang

@jang_yoel

6 months

Introducing 𝐃𝐫𝐞𝐚𝐦𝐆𝐞𝐧! We got humanoid robots to perform totally new 𝑣𝑒𝑟𝑏𝑠 in new environments through video world models. We believe video world models will solve the data problem in robotics. Bringing the paradigm of scaling human hours to GPU hours. Quick 🧵

0

4

Qinsheng Zhang

@qsh_zh

5 months

nice hands-on demo video!

Hanzi Mao

@hanna_mao

5 months

We build Cosmos-Predict2 as a world foundation model for Physical AI builders — fully open and adaptable. Post-train it for specialized tasks or different output types. Available in multiple sizes, resolutions, and frame rates. 📷 Watch the repo walkthrough

0

1

Max Li 李赵硕

@mli0603

6 months

You can now pair up Cosmos-Reason1 with Cosmos-Predict2 to generate different possible rollouts and pick the best one. Episodic future thinking from world foundation models through test time scaling! Cosmos-Predict2: https://t.co/1mX3XpGQKz (2/n)

1

2

7

Max Li 李赵硕

@mli0603

6 months

Cosmos-Reason1 has exciting updates 💡 Now it understands physical reality — judging videos as real or fake! Check out the resources👇 Paper: https://t.co/TcqqvrhqAD Huggingface: https://t.co/hOLno2IyhW Code: https://t.co/UUg90bmcGW Project page: https://t.co/Dr6ZqnKM8o (1/n)

2

32

101

Huan Ling

@HuanLing6

6 months

We are excited to share Cosmos-Drive-Dreams 🚀 A bold new synthetic data generation (SDG) pipeline powered by world foundation models—designed to synthesize rich, challenging driving scenarios at scale. Models, Code, Dataset, Tookit are released. Website:

11

44

107

Prithvijit

@prithvijitch

6 months

The WorldModelBench workshop is happening tomorrow (June 12th) at #CVPR2025! We have an exciting series of talks, do attend! Place: Room 108 Time: Morning Session #NVIDIAResearch

Prithvijit

@prithvijitch

9 months

Join us at the WorldModelBench workshop at #CVPR2025 where we'll tackle systematic evaluation of World Models! Focus: benchmarks, metrics, downstream tasks, and safety. Submit papers now:

1

10

19

Qinsheng Zhang

@qsh_zh

6 months

many core-contributors are attending #CVPR2025 . Let’s discuss the future of world models!

Qinsheng Zhang

@qsh_zh

6 months

🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly

0

5

Qinsheng Zhang

@qsh_zh

6 months

🚀 Introducing Cosmos-Predict2! Our most powerful open video foundation model for Physical AI. Cosmos-Predict2 significantly improves upon Predict1 in visual quality, prompt alignment, and motion dynamics—outperforming popular open-source video foundation models. It’s openly

7

62

204