imankitgoyal Profile Banner
Ankit Goyal Profile
Ankit Goyal

@imankitgoyal

Followers
3K
Following
321
Media
38
Statuses
199

Foundation Models for Robotics, Nvidia Research, Princeton PhD

Seattle, WA
Joined March 2020
Don't wanna be here? Send us removal request.
@imankitgoyal
Ankit Goyal
2 months
What's the right architecture for a VLA? VLM + custom action heads (π₀)? VLM with special discrete action tokens (OpenVLA)? Custom design on top of the VLM (OpenVLA-OFT)? Or... VLM with ZERO modifications? Just predict action as text. The results will surprise you. VLA-0:
18
70
533
@imankitgoyal
Ankit Goyal
43 minutes
Happy to share that the code for VLA-0 is out now: https://t.co/Vg8wsCSIPQ Given its simplicity, it’s a great starting point to try out VLAs!
Tweet card summary image
github.com
VLA-0: Building State-of-the-Art VLAs with Zero Modification - NVlabs/vla0
@imankitgoyal
Ankit Goyal
2 months
What's the right architecture for a VLA? VLM + custom action heads (π₀)? VLM with special discrete action tokens (OpenVLA)? Custom design on top of the VLM (OpenVLA-OFT)? Or... VLM with ZERO modifications? Just predict action as text. The results will surprise you. VLA-0:
0
1
5
@imankitgoyal
Ankit Goyal
18 days
To my friends and family in India Please raise your voice and DEMAND clean air! It is your fundamental right. Think about the youngest member of your family. What have they done to lose years of their life just because they are born in India. Enough of ignorance.
0
0
9
@imankitgoyal
Ankit Goyal
2 months
The launch of the first humanoid for consumers, Neo-X, is truly exciting! Many are claiming this means robot learning is solved and that 1X has leapfrogged everyone else, but the real picture is much more nuanced. From a hardware and platform perspective, it looks incredibly
1
0
14
@imankitgoyal
Ankit Goyal
2 months
Had a great time guest lecturing in @YuXiang_IRVL's course on Vision-Language-Action (VLA) models. Check out the full recording 👇
@YuXiang_IRVL
Yu Xiang
2 months
Are you interested in Vision-Language-Action (VLA) models? We had an excellent guest lecture today by Ankit Goyal @imankitgoyal from NVIDIA on VLAs and their role in robot manipulation 🎥 Watch the recording here 👇 https://t.co/fhl15iKGr8 Slides: https://t.co/9lxXTXtX9r
1
7
53
@OpenDriveLab
OpenDriveLab
2 months
🚀 Join us at  #ICCV2025 for a full-day workshop: “Learning to See: Advancing Spatial Understanding for Embodied Intelligence” 🗓️ October 19 • 📷 Room 312 Meet our incredible lineup of speakers: @MattNiessner @jiadeng @pulkitology @KaterinaFragiad @YunzhuLiYZ @imankitgoyal
1
8
38
@imankitgoyal
Ankit Goyal
2 months
Huge thanks to my incredible collaborators: @HugoHadfield1, Xuning Yang, Valts Blukis, Fabio Ramos And the amazing teams at NVIDIA @NVIDIARobotics @NVIDIAAI @NVIDIAEmbedded If you're excited about simple, effective approaches to VLAs: 💻 Code:
Tweet card summary image
github.com
VLA-0: Building State-of-the-Art VLAs with Zero Modification - NVlabs/vla0
2
1
32
@imankitgoyal
Ankit Goyal
2 months
How does such a simple architecture achieve this? It's all in the recipe 🔬 Three key techniques: 1️⃣ Action Decoding Represent actions as integers → Arbitrary resolution without changing vocabulary 2️⃣ Ensemble Prediction Average predictions across timesteps → Temporal
1
0
26
@imankitgoyal
Ankit Goyal
2 months
"Does it work on real robots?" YES. ✅ Tested on SO-101 arm: • Block reorientation • Object pushing • Pick & place Outperforms SmolVLA (+12.5 points). Notably: SmolVLA was pretrained on large-scale SO-100 data. VLA-0 trained from scratch on 100 demonstrations per task.
1
0
19
@imankitgoyal
Ankit Goyal
2 months
Introducing VLA-0 🚀 The entire architecture: Prompt a VLM to output actions as text. That's it. No new components. No change to VLM vocabulary. On LIBERO benchmark: → #1 among non-pretrained methods → Outperforms π₀.5-KI, OpenVLA-OFT, SmolVLA Even beats models pretrained
2
2
32
@imankitgoyal
Ankit Goyal
3 months
Looking for the latest and greatest in robotic policy learning? Check out👇ManiFlow — our new flow/diffusion-based method that combines algorithmic advances like consistency flow matching with architectural innovations such as DiT-X. It achieves very strong results in both sim &
@GeYan_21
Ge Yan
3 months
Introduce ManiFlow 🤖, a visual imitation learning policy for general robot manipulation that is efficient, robust, and generalizable: - 98.3% improvement on 8 real-world tasks, generalizing to novel objects & backgrounds - Applied to diverse embodiments: single-arm, bimanual &
1
2
35
@imankitgoyal
Ankit Goyal
5 months
Senior Research Scientist: https://t.co/sAnqQAtr5O Research Scientist, New College Grad 2025: https://t.co/CtAUlumCKt Learn more about our team's work:
research.nvidia.com
NVIDIA Robotics
0
0
13
@imankitgoyal
Ankit Goyal
5 months
We, at NVIDIA's Seattle Robotics Research team, are hiring. 🤖 We are seeking Senior Research Scientists and New College Graduates (2025) to join us. Some areas of interest include: Vision-Language-Action (VLA) models & Bimanual and dextrous manipulation. This is a unique
8
17
331
@imankitgoyal
Ankit Goyal
6 months
4. Flowing from Words to Pixels An insight that seems so simple in hindsight. For conditional generation, instead of starting from noise, why not flow directly from source to the target distri.? I'll be watching closely if this becomes the norm. Great Work by @Qihao Liu et al.
0
0
2
@imankitgoyal
Ankit Goyal
6 months
3. Prompting Depth Anything for 4K Metric Depth It’s a very practical way to get dense and accurate metric depth. It upgrades a monocular depth models for metric accuracy by using data from metric sensors, getting the best of both worlds. Great work by Haotong Lin et al.
1
0
2
@imankitgoyal
Ankit Goyal
6 months
2. Reconstructing Animals and the Wild This work generates complete scenes from natural images, trained with just synthetic Infinigen data. While working on Infinigen, I never thought it could be used so creatively. Fantastic work by Peter @Michael_J_Black @silvia_zuffi
2
0
2
@imankitgoyal
Ankit Goyal
6 months
That’s a wrap for #CVPR2025! Here's a 🧵 of some really cool works 👇 1. Let Humanoids Hike! Great work @ky_lin0305 and Stella Xu. They drove home the point that we can't treat locomotion and navigation as separate. The ultimate test: Can your robot complete a hike on its own?
1
0
14