
Dhruv Shah
@shahdhruv_
Followers
5K
Following
6K
Media
126
Statuses
921
professor @Princeton | researcher @GoogleDeepMind
San Francisco, CA
Joined April 2012
Yesterday, we live demo-ed a “generalist” VLA for (I think) the first time ever to a broad audience @RoboticsSciSys. Bring any object. Ask anything. New environment, new instructions, no fine-tuning. Just impeccable vibes! ✨
7
29
338
Excited to share our new work on making VLAs omnimodal — condition on multiple different modalities (one at a time or all at once)! It allows us to train on more data than any single-modality model, and outperforms any such model: more modalities = more data = better models! 🚀
We trained OmniVLA, a robotic foundation model for navigation conditioned on language, goal poses, and images. Initialized with OpenVLA, it leverages Internet-scale knowledge for strong OOD performance. Great collaboration with @CatGlossop, @shahdhruv_, and @svlevine.
4
23
139
You have to watch this! For years now, I've been looking for signs of nontrivial zero-shot transfer across seen embodiments. When I saw the Alohas unhang tools from a wall used only on our Frankas I knew we had it! Gemini Robotics 1.5 is the first VLA to achieve such transfer!!
22
51
336
We are @corl_conf with robots and another interactive VLA demo! Come to the @GoogleDeepMind booth to check out the Gemini Robotics 1.5 VLA in action on Frankas and Aloha — bring your objects and ask anything 🦾
1
13
77
I'll be speaking at the Eval&Deploy workshop today @corl_conf at 12.05pm, and be on a couple of panels (Eval&Deploy 3pm, RemembeRL 3.45pm). Come ask some fun/spicy/hard questions!
0
0
27
Check out our tech report for more details and rigorous evaluation. We are hiring! Come to the booth to see us, or our models :) PDF: https://t.co/ltPSKN51rz
0
1
14
This result was very surprising! Not only can the model explain its actions and plan future steps, it can actually detect failures and re-plan to be more persistent and clever. Once you start thinking, there's no going back!
1
1
7
🤔With Thinking turned on, our VLA can interleave textual reasoning (substep planning, success detection, error mitigation etc.) with raw actions, all in the same model! Result: Training thoughts and actions end-to-end enables better actions!
1
1
2
This goes beyond simply training on cross-embodiment data, learning embodiment-agnostic visuomotor behaviors. The same pre-trained ckpt can solve the same task on many robots! Multi-embodiment training + Motion Transfer = improvements across the board and better generalization
1
1
6
🔁This is the most impressive transfer result I've seen: raw images to raw actions, across robots with different cameras and action spaces and ... We use a novel mechanism called Motion Transfer to learn across pre-training embodiments: no explicit alignment required!
1
6
58
Excited to share the next gen of Gemini Robotics! I want to highlight two key pre-training advances that fundamentally changed how I think about VLAs. 🔁 Motion Transfer: Out-of-box transfer of skills across embodiments 🤔 Interleaved Thinking: Real-time reasoning about actions
We’re making robots more capable than ever in the physical world. 🤖 Gemini Robotics 1.5 is a levelled up agentic system that can reason better, plan ahead, use digital tools such as @Google Search, interact with humans and much more. Here’s how it works 🧵
2
13
55
A new VLA for navigation that can take in goal images, positions, and language, and exhibits some pretty neat emergent language following!
We trained OmniVLA, a robotic foundation model for navigation conditioned on language, goal poses, and images. Initialized with OpenVLA, it leverages Internet-scale knowledge for strong OOD performance. Great collaboration with @CatGlossop, @shahdhruv_, and @svlevine.
6
46
370
Language following is a tough problem for VLAs: while these models can follow complex language, in practice getting datasets that enable language following is hard. We developed a method to counterfactually and automatically label data to improve language following! 🧵👇
7
69
416
We took a robot to RSS in LA running our new Gemini Robotics On-Device VLA model. People interacted with the model with new objects and instructions in a brand new environment and the results were amazing!
3
18
139
Thanks for cheering @GautamSalhotra @Ishika_S_ @RussTedrake @_abraranwar and our favorite Prof. Rudy who is still not on Twitter!
1
0
8
This is the Gemini Robotics On-Device VLA that runs on a single GPU: and you can apply for access today! Shoutout to @ayzwah @debidatta @SudeepDasari @ashwinb96 @xjygr08 @xiao_ted @sippeyxp @jackyliang42 @TonyWentaoYuan @ColinearDevin and everyone else at GDM Robotics for making
Excited to release Gemini Robotics On-Device and bunch of goodies today 🍬 on-device VLA that you can run on a GPU 🍬 open-source MuJoCo sim (& benchmark) for bimanual dexterity 🍬 broadening access to these models to academics and developers https://t.co/mSjXTLuOeu
1
0
15
Join us for a full day of exciting talks and discussions on learning representations for robotic intelligence! Learned Robot Representations (RoboReps) Workshop @ #RSS2025 📍 Location: SGM 124 📅 Full schedule: https://t.co/25B7CElgjk
1
8
32