
Qingqing Zhao
@qingqing_zhao_
Followers
1K
Following
983
Media
11
Statuses
72
PhD candidate at Stanford
Palo Alto
Joined January 2018
Introduce CoT-VLA – Visual Chain-of-Thought reasoning for Robot Foundation Models! 🤖. By leveraging next-frame prediction as visual chain-of-thought reasoning, CoT-VLA uses future prediction to guide action generation and unlock large-scale video data for training. #CVPR2025
5
55
296
RT @boyang_deng: Curious about how cities have changed in the past decade? We use MLLMs to analyse 40 million Street View images to answer….
0
14
0
RT @HanshengCh: Excited to share our work: .Gaussian Mixture Flow Matching Models (GMFlow).GMFlow generalizes diffu….
0
32
0
Project website: Jointly with @NVIDIAAI and @StanfordAILab. With amazing @Yao__Lu,@moo_jin_kim,@zipengfu, @zhuoyang_zhang,@YechengW73445,@mli0603,@qianli_m,@chelseabfinn,@songhan_mit,@ankurhandos,@liu_mingyu,@DonglaiXiang,@GordonWetzstein,@TsungYiLinCV
0
0
12
RT @moo_jin_kim: Introducing OFT—an Optimized Fine-Tuning recipe for VLAs!. Fine-tuning OpenVLA w/ OFT, we see:.-25-50x faster inference ⚡️….
0
69
0
Congrats Tsung-Yi and well deserved 🎉.
Honored that COCO received the Koendrink Prize at ECCV 2024. It’s been incredible to witness advancements driven by well curated data over the past decade. I'm excited for the future of multi-modal understanding and generation—data will remain key, and we’re just getting started.
0
0
4
RT @TsungYiLinCV: My first SIGGRAPH at #SIGGRAPH2024 ! @chenhsuanlin @JiashuXu2 @DonglaiXiang will show 3D scene generation in real time fr….
0
5
0
RT @zluleee: We'll present "Neural Control Variates with Automatic Integration" at the Monte Carlo for PDE session in Mile High 4 at 11:25….
0
8
0
RT @PeizhuoL: Want to #WalkTheDog in the metaverse? In our project at #SIGRAPH2024 with @blacksquirrel__ , Yuting Ye and @OlgaSorkineH, we….
0
13
0
RT @ankurhandos: At #RSS2024, we are excited to share AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries, which….
0
6
0
RT @zipengfu: Introduce Mobility VLA - Google's foundation model for navigation - started as my intern project:.- Gemini 1.5 Pro for high-l….
0
27
0
PhysAvatar has been accepted into #ECCV2024! 🎉 We create virtual avatars with realistic garment dynamics and lighting, with a physics simulator in the loop.
How do we create realistic models of dressed humans directly from visual data?. We introduce PhysAvatar, a framework that estimates the shape, appearance, and physical parameters of dressed human avatars from multi-view videos. Page: (1/6)
7
20
197
RT @xuxin_cheng: Introduce Open-𝐓𝐞𝐥𝐞𝐕𝐢𝐬𝐢𝐨𝐧🤖: ..We need an intuitive and remote teleoperation interface to collect more robot data. 𝐓𝐞𝐥𝐞𝐕𝐢….
0
227
0
RT @moo_jin_kim: ✨ Introducing 𝐎𝐩𝐞𝐧𝐕𝐋𝐀 — an open-source vision-language-action model for robotics! 👐. - SOTA generalist policy.- 7B params….
0
160
0
RT @_ellisbrown: Cambrian-1 🪼. Through a vision-centric lens, we study every aspect of building Multimodal LLMs except the LLMs themselves.….
huggingface.co
0
31
0