abhishekunique7 Profile Banner
Abhishek Gupta Profile
Abhishek Gupta

@abhishekunique7

Followers
10K
Following
939
Media
202
Statuses
554

Assistant Professor at University of Washington. I like robots, and reinforcement learning. Previously: post-doc at MIT, PhD at Berkeley

Seattle, WA
Joined February 2012
Don't wanna be here? Send us removal request.
@abhishekunique7
Abhishek Gupta
1 month
Imitation learning is great, but needs us to have (near) optimal data. We throw away most other data (failures, evaluation data, suboptimal data, undirected play data), even though this data can be really useful and way cheaper! In our new work - RISE, we show a simple way to
7
49
238
@max_simchowitz
Max Simchowitz
11 days
🧐🧐 Why do we pretrain LLMs with log likelihood? Why does action chunking work so well in robotics? Why is EMA so ubiquitous? And could their be a mathematical basis for Moravec’s paradox? 🤖🤖 Come check out our NeurIPS  2025 Tutorial “Foundations of Imitation Learning” with
6
40
273
@abhishekunique7
Abhishek Gupta
12 days
If you're at NeurIPS, don't miss this! :)
@canondetortugas
Dylan Foster 🐢
13 days
Happening this Tuesday 1:30 PST @ NeurIPS: Foundations of Imitation Learning: From Language Modeling to Continuous Control A tutorial with Adam Block & Max Simchowitz (@max_simchowitz).
0
2
56
@abhishekunique7
Abhishek Gupta
20 days
🐐🐐
@shahdhruv_
Dhruv Shah
21 days
My group @Princeton is hiring! We are looking for strong postdoc and PhD candidates to join our quest for intelligent robots in open-world environments. Read more below and get in touch 🤖🐅🧡 https://t.co/7o35pwPZCz
1
5
50
@abhishekunique7
Abhishek Gupta
25 days
Friends at PI doing some very cool work🎉
@svlevine
Sergey Levine
25 days
We just released results for our newest VLA from Physical Intelligence: π*0.6. This one is trained with RL, and it makes it quite a bit better: often doubles throughput, enables real-world tasks like folding real laundry and making espresso drinks at the office.
1
6
87
@abhishekunique7
Abhishek Gupta
1 month
Congrats to @natashajaques!!
@schmidtsciences
Schmidt Sciences
1 month
We're excited to welcome 28 new AI2050 Fellows! This 4th cohort of researchers are pursuing projects that include building AI scientists, designing trustworthy models, and improving biological and medical research, among other areas. https://t.co/8oY7xdhxvF
1
0
17
@abhishekunique7
Abhishek Gupta
1 month
What should you take away: 1) Use all your data, just throw it in the buffer with a 0. Don’t waste any of your failed data! 2) even suboptimal data can teach you how to recover back to experts 3) stitching can be hard in low coverage settings, some smoothness assumptions can
Tweet card summary image
arxiv.org
Imitation learning has proven effective for training robots to perform complex tasks from expert human demonstrations. However, it remains limited by its reliance on high-quality, task-specific...
2
1
6
@abhishekunique7
Abhishek Gupta
1 month
Interestingly, this also extends to policy evaluations. We can iteratively keep reincorporating policy evalution rollouts into RISE to keep improving. Evaluation data is not wasted! (9/10)
1
1
3
@abhishekunique7
Abhishek Gupta
1 month
Secondly, we find that RISE can make use of failed or suboptimal demonstrations, significantly improving learned policies over standard imitation learning. Moreover this works across different tasks, including with deformables! (8/10)
1
0
2
@abhishekunique7
Abhishek Gupta
1 month
Let’s see how well this works! First, using RISE allows your imitation learning policies to be a *lot* more robust, using non-optimal data to recover from OOD scenarios. On the flipside, just imitation non-optimal data or standard offline RL is far less effective (as we see
1
0
3
@abhishekunique7
Abhishek Gupta
1 month
The fix is easy! - allow the policy to “borrow” actions from nearby states. In doing so, policies can widen their action distributions in controlled ways, significantly improving their ability to stitch suboptimal data! This can be implemented in simple ways - e.g enforcing
1
1
3
@abhishekunique7
Abhishek Gupta
1 month
Looking into it a bit deeper, Q functions are able to stitch reasonably well, but we find that the marginal action distribution captured by the policy is overly conservative, with little action coverage. This lack of coverage in the policy distribution makes it hard to actually
1
1
9
@abhishekunique7
Abhishek Gupta
1 month
Simple enough right? - not so different from what @sidgreddy and @svlevine proposed in SQIL. Now, the issue is that plain old offline RL approaches often fail to stitch together disjoint trajectories under relatively sparse data coverage (especially from vision). So while there
1
0
4
@abhishekunique7
Abhishek Gupta
1 month
Our idea in Robust Imitation by Stitching from Experts (RISE) is simple - while you can’t just imitate non-optimal data, you can use it to learn how to *get back to expert states* using offline RL. It’s a particularly simple offline RL problem - no reward models needed; label
1
0
4
@abhishekunique7
Abhishek Gupta
1 month
To start - we know that BC can be fragile. Particularly, we know that BC struggles when placed in OOD configurations with sparse data coverage. Now we can of course collect more data - but this requires expensive high-quality demonstrations. But non-expert data can be plentiful -
2
0
6
@abhishekunique7
Abhishek Gupta
2 months
This is one of my first times working on a non-manipulation project, learned a tremendous deal from the excellent @mateoguaman! Also shout out to our undergraduate collaborators @sidharth_0_r and Daniel Gorbatov for doing a fantastic job on their first projects with us 🎉
0
1
4
@abhishekunique7
Abhishek Gupta
2 months
We set out to develop steerable models for cross-embodiment, open-world navigation in outdoor environments. To this end, @mateoguaman and team developed VAMOS, a VLA model that can absorb diverse, cross embodiment data for broad generalization, while being able to specialize to
Tweet card summary image
arxiv.org
A fundamental challenge in robot navigation lies in learning policies that generalize across diverse environments while conforming to the unique physical constraints and capabilities of a specific...
@mateoguaman
Mateo Guaman Castro
2 months
How can we create a single navigation policy that works for different robots in diverse environments AND can reach navigation goals with high precision? Happy to share our new paper, "VAMOS: A Hierarchical Vision-Language-Action Model for Capability-Modulated and Steerable
1
2
12
@abhishekunique7
Abhishek Gupta
2 months
With Semantic World Models, we take a step toward using language as the medium for world modeling and planning. Check out our fun demo on the project website where you can probe what an SWM behaves like, with actions and questions about the future outcome! A really fun and
0
2
10
@abhishekunique7
Abhishek Gupta
2 months
Our analysis shows SWM inherits VLM priors and correctly attends to objects referred to in the questions. In the examples below, the model is prompted with “Is the red moon touching the blue cube?” This suggests that SWM models focus on *what matters* for decision making (5/6)
1
2
7
@abhishekunique7
Abhishek Gupta
2 months
We empirically evaluate SWM as a modeling/planning framework on two benchmarks: LangTable and OGBench. For each benchmark, we finetune a pretrained PaliGemma-3B VLM on observations, actions, and future QA pairs with stanard SFT, and use it to plan for multiple tasks. We find that
1
1
4