Super excited to share Eureka, our "spin" on how to use LLMs to teach low-level dexterity skills!
Eureka is an open-ended reward design agent that can write and evolve superhuman reward functions for a large suite of robots and tasks, including challenging pen spinning tricks!
Can GPT-4 teach a robot hand to do pen spinning tricks better than you do?
I'm excited to announce Eureka, an open-ended agent that designs reward functions for robot dexterity at super-human level. It’s like Voyager in the space of a physics simulator API!
Eureka bridges the
Excited to share VIP, a self-supervised visual reward and representation pre-trained on diverse human videos!
VIP’s frozen reward and rep. can solve diverse unseen robot tasks using TrajOpt, online RL, and enables real-world few-shot offline RL!
🧵:
Excited to share our
#ICML2023
paper ✨LIV✨!
Extending VIP, LIV is at once a pre-training, fine-tuning, and (zero-shot!) multi-modal reward method for (real-world!) language-conditioned robotic control.
Project:
Code & Model:
🧵:
Humbled to share that I was selected as an Apple Scholar in AIML PhD Fellowship! Very grateful to Apple, my advisors
@dineshjayaraman
@obastani
as well as all my mentors and collaborators for their support!
Excited to share my first paper as an "advisor" :D
We show that pre-trained visual representations enable a simple, fast, no-training subgoal decomposition method for long-horizon robotic manipulation!
Paper:
Website:
(🧵1/n)
I am attending
#CORL2023
and presenting two new papers at various workshops!
Excited to make new friends and catch up! Please reach out if you are attending and would like to chat about anything robot learning :)
This is so impressive! I can't imagine the amount of progress we will unlock as a community with low-cost, highly capable robots. Congrats to the Unitree Team!
Unitree Introducing | Unitree G1 Humanoid Agent | AI Avatar
Price from $16K 🤩
Unlock unlimited sports potential(Extra large joint movement angle, 23~34 joints)
Force control of dexterous hands, manipulation of all things
Imitation & reinforcement learning driven
#Unitree
#AI
Learning policies in simulation and transferring to the real world (or Sim-To-Real in short) is a promising strategy for robots to learn complex skills. However, humans need to tune the simulator carefully so that the policies work robustly in the real world: this is difficult,
Due to popular requests, we have now uploaded our pre-trained LIV model on HuggingFace for easier downloads! This is my first time doing it, and the experience was quite smooth
@_akhaliq
Excited to share our
#ICML2023
paper ✨LIV✨!
Extending VIP, LIV is at once a pre-training, fine-tuning, and (zero-shot!) multi-modal reward method for (real-world!) language-conditioned robotic control.
Project:
Code & Model:
🧵:
We are presenting LIV today at
#ICML2023
!
Exhibit Hall 1,
#827
2:00pm - 3:30pm HST
The future of robotics is multi-modal, and LIV demonstrates how multi-modal value pre-training from diverse human videos can bootstrap language-conditioned robot skill learning.
See you there!
Excited to share our
#ICML2023
paper ✨LIV✨!
Extending VIP, LIV is at once a pre-training, fine-tuning, and (zero-shot!) multi-modal reward method for (real-world!) language-conditioned robotic control.
Project:
Code & Model:
🧵:
👀 Discover the top 10
#NVIDIAresearch
projects of the year.
✨ From Neuralangelo's high-fidelity neural surface reconstruction to Magic3D's text-to-3D content creation, these projects push the boundaries of innovation in
#AI
.
I am attending
#ICML2023
next week in Hawaii! Excited to make new friends and re-connect with old ones!
Please reach out if you are attending and would like to chat about anything related to research or ML! My particular interests include foundation models, RL, and robotics!
Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof.
@yukez
. GEAR stands for Generalist Embodied Agent Research.
We believe in a future where every machine that moves will be autonomous, and robots and
At a technical level, DrEureka, following our prior work Eureka (), uses LLM-guided evolutionary search to generate safety-aware reward functions in code that can be used to train policies in sim. Then, leveraging LLMs’ capability as hypothesis generators,
Hello! We have
@JasonMa2020
from UPenn giving a talk at this week's robot learning seminar (Thursday 11:30am EST online). Hope to see you all there!
Title: Foundation Reward Models for General Robot Skill Acquisition
#Robotics
#MachineLearning
Thanks
@_akhaliq
!
LIV is now on arXiv:
Check it out if you are interested in the space of (RL-based) vision-language pre-training for robotics! Happy to answer any questions about the paper :)
LIV: Language-Image Representations and Rewards for Robotic Control
paper page:
Language-Image Value (LIV) is a unified pre-training, fine-tuning, and reward learning algorithm for language-conditioned visual manipulation. LIV can perform zero-shot
I am attending
#CORL2023
and presenting two new papers at various workshops!
Excited to make new friends and catch up! Please reach out if you are attending and would like to chat about anything robot learning :)
Excited to share VIP, a self-supervised visual reward and representation pre-trained on diverse human videos!
VIP’s frozen reward and rep. can solve diverse unseen robot tasks using TrajOpt, online RL, and enables real-world few-shot offline RL!
🧵:
Check out our
#L4DC
paper on learning policy-aware dynamics model for reinforcement learning!
The idea is very simple: focus model learning on the current policy’s visitation distribution. We theoretically show why this is desirable and extend the dual RL paradigm to MBRL,
Excited to share our our
#L4DC2023
paper that introduces "Transition Occupancy Matching"(TOM)
TOM learns a dynamics model that keeps up with the improving policy, facilitating continued progress
Paper 📰:
Code 💻:
🧵
I am attending
#NeurIPS2022
next week to present several works on offline RL and pre-training for robotics! Would love to meet people and discuss anything, in particular, RL and robot learning topics! DM me if you are attending and want to chat or grab coffee 😀
Nice talk on the technical approach to intelligent humanoids at 1X!
As usual for
@ericjang11
’s spicy takes, I agree strongly with 60%, ambivalent on 20%, and disagree with 20%.
Highly recommend a watch! 💯
This was a very fun project and we learned a lot about how to use LLMs to enable robot skill learning! There are many challenges and potential future directions. For example, how to combine DrEureka with real-world execution feedback and using vision to provide feedback on reward
Reward learning is a fundamental challenge in RL. In VIP, we address this by pre-training a value function on action-free human videos, and the pre-trained VIP value function can zero-shot transfer to unseen robot tasks!
Find out more about VIP at the Deep RL workshop tomorrow!
Does your sim2real robot falter at critical moments 🤯? Want to help but unsure how, all you can do is reward tuning in sim 😮💨?
Introduce 𝐓𝐑𝐀𝐍𝐒𝐈𝐂 for manipulation sim2real. Robots learned in sim can accomplish complex tasks in real, such as furniture assembly.
🤿🧵
UVD code is open-source now:
Get your long-horizon demonstrations segmented in few seconds!
We support various SOTA pre-trained reps (VIP, R3M, LIV, VC-1, ...) as well as policy backbones (MLP, GPT).
Excited to share my first paper as an "advisor" :D
We show that pre-trained visual representations enable a simple, fast, no-training subgoal decomposition method for long-horizon robotic manipulation!
Paper:
Website:
(🧵1/n)
We highlight quadruped yoga ball walking. This task is particularly hard because (1) it is a novel task for which the LLM could not have seen human-generated reward functions or DR, and (2) simulation cannot model the deformable surface of an air-inflated ball, making good
I'll be giving a talk at
@GRASPlab
on Wednesday, 3/27 on the robot learning we're doing at
@1x_tech
. If you're a researcher at Penn working on similar things I'd love to visit your labs and see what you're working on as well! Please DM
Giving contributed talks on VIP at the Offline RL, Foundation Model for Decision Making, and Deep RL workshops at
#NeurIPS2022
. Come check out how we can pre-train a value function on passive human data and zero-shot transfer to robotics manipulation!
The DrEureka policy, as you have seen in the uncut videos, is quite robust in the real world. It can successfully stay balanced over curbs, terrain changes, and even when the ball is being kicked!
This is my internship project at
@NVIDIAAI
! I had a blast working on it and learned a lot from the experience. I am really grateful to my mentors
@AnimaAnandkumar
@DrJimFan
@yukez
for their guidance on the project!
We even tried to deploy the policy while the yoga ball was being deflated. Here is a ~1-minute uncut video of the robot balancing well until it eventually loses control😅
We (
@yayitsamyzhang
) are presenting VIP at
#ICLR
tomorrow!
Talk: Oral 1 Track 5: Reinforcement Learning
Poster Session: MH1-2-3-4
#118
Looking forward to having discussions with you there!
Excited to share VIP, a self-supervised visual reward and representation pre-trained on diverse human videos!
VIP’s frozen reward and rep. can solve diverse unseen robot tasks using TrajOpt, online RL, and enables real-world few-shot offline RL!
🧵:
Thrilled to announce the first annual Reinforcement Learning Conference
@RL_Conference
, which will be held at UMass Amherst August 9-12!
RLC is the first strongly peer-reviewed RL venue with proceedings, and our call for papers is now available: .
The exploration-exploitation tradeoff in
#RL
raises concerns about exploration -- who does it impact and how much? In a recent
#AISTATS
paper, we show how to effectively "spread out" exploration across episodes (individuals) w/ only a small cost to regret:
Beyond yoga ball walking, we also benchmarked DrEureka on Quadruped Locomotion and Dexterous Cube Rotation, two known tasks where there are pre-existing human-designed reward functions and DR configurations. We find DrEureka configurations to match or outperform human-designed
How can robots learn manipulation *just* by watching videos of humans in different unstructured settings?
In our new paper, we develop a framework enabling zero-shot coarse robot manipulation from passive human videos (a 🧵)
w/ Abhinav Gupta,
@shubhtuls
,
@Vikashplus
1/N
One visual representation for a wide variety of manipulation and navigation tasks! Check out our recent work on building a Visual Cortex for Embodied AI.
Contemporary discussion (hype?) about LLMs and “pausing AGI development” seems oblivious of Moravec’s paradox.
We’ve hypothesized since the 80s — that the hardest problems in AI involve sensorimotor control, not abstract thought or reasoning.
It
The GRASP SFI series is back for the Fall Semester! Please join us TODAY from 3pm - 4pm EST as Dr. Jim Fan presents "Generalist Agents in Open-Ended Worlds".
For more info and how to join, please visit:
#GRASP
#GRASPLab
#GRASPSFI
How can pre-trained visual representations help solve long-horizon manipulation? 🤔
Introducing Universal Visual Decomposer (UVD), an off-the-shelf method for identifying subgoals from videos - NO extra data, training, cost, or task knowledge required. (🧵1/n)
2. Human interference during deployment
UVD can also enable agents to auto-skip sub-stages preemptively finished by humans and can reset to redo certain stages during deployment. (🧵5/n)
Happening now!
@dineshjayaraman
from
@PennEngineers
is giving a talk at the Embodied Intelligence Seminar
@MIT_CSAIL
on Polyglot Robots: Versatile Goal-Based Task Specification for Robot Learning
Streaming at: , cohosted with
@du_yilun
A visual cortex is the region of the brain that (together with the motor cortex) enables an organism to convert vision into movement.
We present an artificial visual cortex — the module in an AI system that enables an artificial agent to convert camera input into actions.
🧵👇
Pre-trained reprs, e.g VIP (), can produce smooth embeddings on videos of atomic tasks. Given this, UVD discovers subgoals by recursively detecting phase shifts in the embedding space. This simple idea works well across human or robot videos! (🧵2/n)
Cool work on RL based pre-training on observational data!
If you are interested in this line of work, also check out our work VIP (), which directly obtains a visual representation and universal value function from human videos!
"Ask not what representation learning can do for RL, ask what RL can do for representation learning" -- JFK?
In our new paper: a useful way to pre-train representations on video data for decision-making agents is basically to run RL on the video!
Really thrilled that our paper was picked a best paper finalist at CORL22! If you’re attending, come to our oral on Saturday at 11:20am!
We present Interactive Reward Function (IRF) policies, which interact with the world to provide rewards for training task policies. (1/3)
If you are interested in vision-language reward/representation pre-training for robotics/decision making, also check out concurrent work Voltron() from
@siddkaramcheti
and MineDojo() from
@DrJimFan
and co!
@DrJimFan
@yukez
My Eureka project done in an internship at the group was also selected as one of the "Top 10 NVIDIA Research Projects of 2023"! This speaks to the mission-focused nature of the group and how even a junior research can make substantial impacts!
@natolambert
In my own VIP work (), we explore how fixed pre-trained models fare against increasing compute budget in downstream trajectory optimization:
The code and pre-trained model are open-sourced:
It is also hosted on
@torchRL
as an encoder option!
It is super easy to load VIP for your new task:
from vip import load_vip
vip = load_vip()
vip.eval()
Our second paper is UVD:
UVD is a simple, fast, no-training subgoal decomposition method for long-horizon robotic manipulation using PVRs (e.g., VIP, R3M)!
@ZCCZHANG
@YunshuangL
will be giving the workshop spotlight!
How can pre-trained visual representations help solve long-horizon manipulation? 🤔
Introducing Universal Visual Decomposer (UVD), an off-the-shelf method for identifying subgoals from videos - NO extra data, training, cost, or task knowledge required. (🧵1/n)
What if the form of visuomotor policy has been the bottleneck for robotic manipulation all along? Diffusion Policy achieves 46.9% improvement vs prior StoA on 11 tasks from 4 benchmarks + 4 real world tasks! (1/7)
website :
paper:
VIP enables a simple and practical real-world few-shot offline RL pipeline: just do reward-weighted regression (RWR) with VIP’s reward and the representation! With VIP, offline RL is as simple as BC but far more effective.