Qingqing Zhao @qingqing_zhao_ X Profile

Qingqing Zhao

@qingqing_zhao_

Followers

1K

Following

983

Media

11

Statuses

72

PhD candidate at Stanford

Palo Alto

Joined January 2018

Don't wanna be here? Send us removal request.

Qingqing Zhao

@qingqing_zhao_

4 months

Introduce CoT-VLA – Visual Chain-of-Thought reasoning for Robot Foundation Models! 🤖. By leveraging next-frame prediction as visual chain-of-thought reasoning, CoT-VLA uses future prediction to guide action generation and unlock large-scale video data for training. #CVPR2025

5

55

296

Qingqing Zhao

@qingqing_zhao_

4 months

RT @boyang_deng: Curious about how cities have changed in the past decade? We use MLLMs to analyse 40 million Street View images to answer….

0

14

0

Qingqing Zhao

@qingqing_zhao_

4 months

RT @HanshengCh: Excited to share our work: .Gaussian Mixture Flow Matching Models (GMFlow).GMFlow generalizes diffu….

0

32

0

Qingqing Zhao

@qingqing_zhao_

4 months

Project website: Jointly with @NVIDIAAI and @StanfordAILab. With amazing @Yao__Lu,@moo_jin_kim,@zipengfu, @zhuoyang_zhang,@YechengW73445,@mli0603,@qianli_m,@chelseabfinn,@songhan_mit,@ankurhandos,@liu_mingyu,@DonglaiXiang,@GordonWetzstein,@TsungYiLinCV

0

12

Qingqing Zhao

@qingqing_zhao_

4 months

Two design choices that we found boost overall performance:. 1. Parallel decoding: predict all action tokens in parallel using full attention,.2. Action chunking: predicting a sequence of action tokens to achieve a goal state instead of a single action. (4/n)

1

0

17

Qingqing Zhao

@qingqing_zhao_

4 months

Real-world rollout! 🌍🤖. See CoT-VLA in action: real-world rollouts with generated future images guiding action generation. Real-world vs. generated images shown side by side. (3/n)

1

0

9

Qingqing Zhao

@qingqing_zhao_

4 months

Why visual chain-of-thought reasoning?.✅ Access to large-scale video data 📹.✅ Minimal preprocessing.✅ Goal-conditioned action generation / inverse dynamics. During inference, CoT-VLA first predicts a future frame, then generates an action chunk to reach that goal state. (2/n)

1

0

11

Qingqing Zhao

@qingqing_zhao_

5 months

RT @moo_jin_kim: Introducing OFT—an Optimized Fine-Tuning recipe for VLAs!. Fine-tuning OpenVLA w/ OFT, we see:.-25-50x faster inference ⚡️….

0

69

0

Qingqing Zhao

@qingqing_zhao_

9 months

kudos to the team!.

Ming-Yu Liu

@liu_mingyu

9 months

Looking for SOTA image/video tokenizers to tokenize visual content efficiently for your Physical AI or Gen AI applications? Cosmos tokenizer provides a family of tokenizers with different compression factors for both continuous and discrete tokens .

0

6

Qingqing Zhao

@qingqing_zhao_

10 months

Congrats Tsung-Yi and well deserved 🎉.

Tsung-Yi Lin

@TsungYiLinCV

10 months

Honored that COCO received the Koendrink Prize at ECCV 2024. It’s been incredible to witness advancements driven by well curated data over the past decade. I'm excited for the future of multi-modal understanding and generation—data will remain key, and we’re just getting started.

0

4

Qingqing Zhao

@qingqing_zhao_

11 months

very cool🔥.

Hong-Xing "Koven" Yu

@Koven_Yu

11 months

🔥Spatial intelligence needs fast, *interactive* 3D world generation 🎮 — introducing WonderWorld: generating 3D scenes interactively following your movement and content requests, and see them in <10 seconds! 🧵1/6. Web: arXiv:

0

41

Qingqing Zhao

@qingqing_zhao_

1 year

RT @TsungYiLinCV: My first SIGGRAPH at #SIGGRAPH2024 ! @chenhsuanlin @JiashuXu2 @DonglaiXiang will show 3D scene generation in real time fr….

0

5

0

Qingqing Zhao

@qingqing_zhao_

1 year

RT @zluleee: We'll present "Neural Control Variates with Automatic Integration" at the Monte Carlo for PDE session in Mile High 4 at 11:25….

0

8

0

Qingqing Zhao

@qingqing_zhao_

1 year

RT @PeizhuoL: Want to #WalkTheDog in the metaverse? In our project at #SIGRAPH2024 with @blacksquirrel__ , Yuting Ye and @OlgaSorkineH, we….

0

13

0

Qingqing Zhao

@qingqing_zhao_

1 year

RT @ankurhandos: At #RSS2024, we are excited to share AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries, which….

0

6

0

Qingqing Zhao

@qingqing_zhao_

1 year

RT @zipengfu: Introduce Mobility VLA - Google's foundation model for navigation - started as my intern project:.- Gemini 1.5 Pro for high-l….

0

27

0

Qingqing Zhao

@qingqing_zhao_

1 year

PhysAvatar has been accepted into #ECCV2024! 🎉 We create virtual avatars with realistic garment dynamics and lighting, with a physics simulator in the loop.

Qingqing Zhao

@qingqing_zhao_

1 year

How do we create realistic models of dressed humans directly from visual data?. We introduce PhysAvatar, a framework that estimates the shape, appearance, and physical parameters of dressed human avatars from multi-view videos. Page: (1/6)

7

20

197

Qingqing Zhao

@qingqing_zhao_

1 year

RT @xuxin_cheng: Introduce Open-𝐓𝐞𝐥𝐞𝐕𝐢𝐬𝐢𝐨𝐧🤖: .⁣.We need an intuitive and remote teleoperation interface to collect more robot data. 𝐓𝐞𝐥𝐞𝐕𝐢….

0

227

0

Qingqing Zhao

@qingqing_zhao_

1 year

RT @moo_jin_kim: ✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐. - SOTA generalist policy.- 7B params….

0

160

0

Qingqing Zhao

@qingqing_zhao_

1 year

RT @_ellisbrown: Cambrian-1 🪼. Through a vision-centric lens, we study every aspect of building Multimodal LLMs except the LLMs themselves.….

huggingface.co

0

31

0