qingqing_zhao_ Profile Banner
Qingqing Zhao Profile
Qingqing Zhao

@qingqing_zhao_

Followers
1K
Following
983
Media
11
Statuses
72

PhD candidate at Stanford

Palo Alto
Joined January 2018
Don't wanna be here? Send us removal request.
@qingqing_zhao_
Qingqing Zhao
4 months
Introduce CoT-VLA – Visual Chain-of-Thought reasoning for Robot Foundation Models! 🤖. By leveraging next-frame prediction as visual chain-of-thought reasoning, CoT-VLA uses future prediction to guide action generation and unlock large-scale video data for training. #CVPR2025
5
55
296
@qingqing_zhao_
Qingqing Zhao
4 months
RT @boyang_deng: Curious about how cities have changed in the past decade? We use MLLMs to analyse 40 million Street View images to answer….
0
14
0
@qingqing_zhao_
Qingqing Zhao
4 months
RT @HanshengCh: Excited to share our work: .Gaussian Mixture Flow Matching Models (GMFlow).GMFlow generalizes diffu….
0
32
0
@qingqing_zhao_
Qingqing Zhao
4 months
Two design choices that we found boost overall performance:. 1. Parallel decoding: predict all action tokens in parallel using full attention,.2. Action chunking: predicting a sequence of action tokens to achieve a goal state instead of a single action. (4/n)
Tweet media one
1
0
17
@qingqing_zhao_
Qingqing Zhao
4 months
Real-world rollout! 🌍🤖. See CoT-VLA in action: real-world rollouts with generated future images guiding action generation. Real-world vs. generated images shown side by side. (3/n)
1
0
9
@qingqing_zhao_
Qingqing Zhao
4 months
Why visual chain-of-thought reasoning?.✅ Access to large-scale video data 📹.✅ Minimal preprocessing.✅ Goal-conditioned action generation / inverse dynamics. During inference, CoT-VLA first predicts a future frame, then generates an action chunk to reach that goal state. (2/n)
Tweet media one
1
0
11
@qingqing_zhao_
Qingqing Zhao
5 months
RT @moo_jin_kim: Introducing OFT—an Optimized Fine-Tuning recipe for VLAs!. Fine-tuning OpenVLA w/ OFT, we see:.-25-50x faster inference ⚡️….
0
69
0
@qingqing_zhao_
Qingqing Zhao
9 months
kudos to the team!.
@liu_mingyu
Ming-Yu Liu
9 months
Looking for SOTA image/video tokenizers to tokenize visual content efficiently for your Physical AI or Gen AI applications? Cosmos tokenizer provides a family of tokenizers with different compression factors for both continuous and discrete tokens .
0
0
6
@qingqing_zhao_
Qingqing Zhao
10 months
Congrats Tsung-Yi and well deserved 🎉.
@TsungYiLinCV
Tsung-Yi Lin
10 months
Honored that COCO received the Koendrink Prize at ECCV 2024. It’s been incredible to witness advancements driven by well curated data over the past decade. I'm excited for the future of multi-modal understanding and generation—data will remain key, and we’re just getting started.
Tweet media one
0
0
4
@qingqing_zhao_
Qingqing Zhao
11 months
very cool🔥.
@Koven_Yu
Hong-Xing "Koven" Yu
11 months
🔥Spatial intelligence needs fast, *interactive* 3D world generation 🎮 — introducing WonderWorld: generating 3D scenes interactively following your movement and content requests, and see them in <10 seconds! 🧵1/6. Web: arXiv:
0
0
41
@qingqing_zhao_
Qingqing Zhao
1 year
RT @TsungYiLinCV: My first SIGGRAPH at #SIGGRAPH2024 ! @chenhsuanlin @JiashuXu2 @DonglaiXiang will show 3D scene generation in real time fr….
0
5
0
@qingqing_zhao_
Qingqing Zhao
1 year
RT @zluleee: We'll present "Neural Control Variates with Automatic Integration" at the Monte Carlo for PDE session in Mile High 4 at 11:25….
0
8
0
@qingqing_zhao_
Qingqing Zhao
1 year
RT @PeizhuoL: Want to #WalkTheDog in the metaverse? In our project at #SIGRAPH2024 with @blacksquirrel__ , Yuting Ye and @OlgaSorkineH, we….
0
13
0
@qingqing_zhao_
Qingqing Zhao
1 year
RT @ankurhandos: At #RSS2024, we are excited to share AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries, which….
0
6
0
@qingqing_zhao_
Qingqing Zhao
1 year
RT @zipengfu: Introduce Mobility VLA - Google's foundation model for navigation - started as my intern project:.- Gemini 1.5 Pro for high-l….
0
27
0
@qingqing_zhao_
Qingqing Zhao
1 year
PhysAvatar has been accepted into #ECCV2024! 🎉 We create virtual avatars with realistic garment dynamics and lighting, with a physics simulator in the loop.
@qingqing_zhao_
Qingqing Zhao
1 year
How do we create realistic models of dressed humans directly from visual data?. We introduce PhysAvatar, a framework that estimates the shape, appearance, and physical parameters of dressed human avatars from multi-view videos. Page: (1/6)
7
20
197
@qingqing_zhao_
Qingqing Zhao
1 year
RT @xuxin_cheng: Introduce Open-𝐓𝐞𝐥𝐞𝐕𝐢𝐬𝐢𝐨𝐧🤖: .⁣.We need an intuitive and remote teleoperation interface to collect more robot data. 𝐓𝐞𝐥𝐞𝐕𝐢….
0
227
0
@qingqing_zhao_
Qingqing Zhao
1 year
RT @moo_jin_kim: ✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐. - SOTA generalist policy.- 7B params….
0
160
0
@qingqing_zhao_
Qingqing Zhao
1 year
RT @_ellisbrown: Cambrian-1 🪼. Through a vision-centric lens, we study every aspect of building Multimodal LLMs except the LLMs themselves.….
Tweet card summary image
huggingface.co
0
31
0