Gagan Khandate
@GaganKhandate
Followers
72
Following
59
Media
2
Statuses
34
Staff Research Scientist @BostonDynamics, PhD @ColumbiaCompSci, undergrad from @iitmadras
Joined February 2023
Introducing🌍 Awesome-World-Models, a one-stop github repo of everything there is to know about world models! Here is a new, curated one-stop resource list for everyone interested in "World Models," aiming to be a go-to guide for researchers and developers in the field. 🧵(1/n)
17
103
674
Excited to share SoftMimic -- a new approach for learning compliant humanoid policies that interact gently with the world.
14
112
630
Simulation drives robotics progress, but how do we close the reality gap? Introducing GaussGym: an open-source framework for learning locomotion from pixels with ultra-fast parallelized photorealistic rendering across >4,000 iPhone, GrandTour, ARKit, and Veo scenes! Thread 🧵
11
65
334
I always found it puzzling how language models learn so much from next-token prediction, while video models learn so little from next frame prediction. Maybe it's because LLMs are actually brain scanners in disguise. Idle musings in my new blog post:
52
175
1K
Is RL really scalable like other objectives? We found that just scaling up data and compute is *not* enough to enable RL to solve complex tasks. The culprit is the horizon. Paper: https://t.co/KsNZgk782S Thread ↓
11
149
921
What makes a robot hand design better at learning from human demonstrations? Is it being similar in size to a human hand, or matching its degrees of freedom? DexMachina lets us explore this question in simulation — and the results are quite interesting! Check it out 😉
How to learn dexterous manipulation for any robot hand from a single human demonstration? Check out DexMachina, our new RL algorithm that learns long-horizon, bimanual dexterous policies for a variety of dexterous hands, articulated objects, and complex motions.
0
9
105
We came up with a really simple way to train flow matching (diffusion) policies with offline RL! Flow Q-learning from @seohong_park uses a distillation (reflow-like) scheme to train flow matching actor, and works super well! Check it out: https://t.co/TYYXGuyAgI
5
54
361
🚀 Meet ToddlerBot 🤖– the adorable, low-cost, open-source humanoid anyone can build, use, and repair! We’re making everything open-source & hope to see more Toddys out there!
Time to democratize humanoid robots! Introducing ToddlerBot, a low-cost ($6K), open-source humanoid for robotics and AI research. Watch two ToddlerBots seamlessly chain their loco-manipulation skills to collaborate in tidying up after a toy session. https://t.co/tIrAUCbzNz
6
19
156
1/9 🚨 New Paper Alert: Cross-Entropy Loss is NOT What You Need! 🚨 We introduce harmonic loss as alternative to the standard CE loss for training neural networks and LLMs! Harmonic loss achieves 🛠️significantly better interpretability, ⚡faster convergence, and ⏳less grokking!
76
529
4K
I've been waiting for this for a while. Open source procedural scene generation from NVIDIA. This kind of thing would be really useful for scaling up simulation data for robots.
4
39
388
We RL'ed humanoid robots to Cristiano Ronaldo, LeBron James, and Kobe Byrant! These are neural nets running on real hardware at our GEAR lab. Most robot demos you see online speed videos up. We actually *slow them down* so you can enjoy the fluid motions. I'm excited to announce
128
466
3K
🔍This study focuses on overcoming the challenges of #ReinforcementLearning (RL) for motor control policies in complex tasks such as #dexterousmanipulation. 🔗Check it out: https://t.co/a1ht4xotL9
@GaganKhandate @XiaoYangLiu10 @ColumbiaCompSci @CUSEAS @Columbia
0
2
3
Over 50 researchers in the robot learning community joining forces on a mission to scale up robot learning to an unprecedented level 🚀 It’s amazing to see what we can achieve as a team! I made so many new friends in the process and I’m truly grateful for that ❤️
After two years, it is my pleasure to introduce “DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset” DROID is the most diverse robotic interaction dataset ever released, including 385 hours of data collected across 564 diverse scenes in real-world households and offices
0
1
12
Introducing ZeroRF, where we reconstruct high-quality radiance fields from *sparse views* at an order of magnitude faster than previous methods (30 secs for 320x320) without any pretraining or additional regularization! https://t.co/9XCRqrgX0r
4
44
299
Represent robot policies as trajectories through space. Naturally allows for cross-embodiment transfer and learning from human video!
What state representation should robots have? 🤖 I’m thrilled to present an Any-point Trajectory Model (ATM), which models physical motions from videos without additional assumptions and shows significant positive transfer from cross-embodiment human and robot videos! 🧵👇
0
4
48
Let me clear a *huge* misunderstanding here. The generation of mostly realistic-looking videos from prompts *does not* indicate that a system understands the physical world. Generation is very different from causal prediction from a world model. The space of plausible videos is
192
742
5K
Can we collect robot data without any robots? Introducing Universal Manipulation Interface (UMI) An open-source $400 system from @Stanford designed to democratize robot data collection 0 teleop -> autonomously wash dishes (precise), toss (dynamic), and fold clothes (bimanual)
41
369
2K
Can we use VLMs out of the box to solve robot control and embodied tasks? Our new work PIVOT shows how this can be done! We used PIVOT to sort food, find you a conference room, and even help you make a cute smiley face out of fruits :) Check it out: https://t.co/68toEa5Ndc
2
23
138
Current works are restricted to short sequences of texts and images, limiting their ability to model the world. Presenting Large World Model (LWM): capable of processing long text, images, videos of over 1M tokens (and *no* lost in the middle!) Project:
We are excited to share Large World Model (LWM), a general-purpose 1M context multimodal autoregressive model. It is trained on a large dataset of diverse long videos and books using RingAttention, and can perform language, image, and video understanding and generation.
5
35
218