Career update: After a wonderful year at UIUC, our lab will be moving to the Computer Science Department at Columbia University this fall.
@ColumbiaCompSci
My time at UIUC has been incredible, thanks to the support from the entire department, especially Nancy. It was an honor
What do James Bartusek of UC Berkeley, Adam Block of MIT, John Hewitt of Stanford, Aleksander Holynski of Google DeepMind & Berkeley AI Research, Yunzhu Li of UIUC, and Silvia Sellan of U Toronto have in common? They are all joining
@ColumbiaCompSci
- meet the Super Six!
My group at UIUC CS is hiring in
#robotics
,
#vision
, and
#learning
starting Fall 2023.
The group will focus on robot learning, with topics in
- Intuitive Physics
- Embodied AI
- Multi-Modal Perception
Check my thesis talk for an overview of my past work.
🎉 Excited to share that we've won the Best Systems Paper Award at
#CoRL2023
for our work on RoboCook!
A huge shoutout to the incredible team:
@HaochenShi74
(lead),
@HarryXu12
, Samuel Clarke, and
@jiajunwu_cs
.
Great to catch up with so many familiar faces at
#CoRL2023
today! We have three Orals this year, and two are award finalists!
Nov 7, 8:30-9:30 am (Oral), 2:45-3:30 pm (Poster)
RoboCook ()
- Finalist for Best Systems Paper Award
- Led by
@HaochenShi74
I had the pleasure of visiting
@CMU_Robotics
over the past two days to give a VASC seminar talk and a guest lecture. Thanks
@GuanyaShi
for the amazing host! 🙌
The seminar talk was about our recent work on "Foundation Models for Robotic Manipulation": 🤖
Introducing “3D Neural Scene Representations for Visuomotor Control”!
(w/ video!)
We combine implicit neural scene representations with intuitive physics models, enabling visuomotor control of dynamic 3D scenes from out-of-distribution viewpoints. (1/7)
Excited to share our work on "Causal Discovery in Physical Systems from Videos" from my internship at
@NVIDIAAI
Paper
Website
Thanks to my amazing collaborators!
@animesh_garg
,
@AnimaAnandkumar
, Dieter Fox, Antonio Torralba
1/7
Introducing RoboEXP, a robotic system that explores! 🤖
When a robot enters a kitchen, it must find all the ingredients before preparing the food for you.
The exploration should not be random, and we use **foundation models** to tell the robot “what” and “how” to explore!
How to make robot adapt to and tackle tasks in unknown environments?
Action-conditioned scene graph building through interactive exploration! !🤖✨ Our RoboEXP system can explore challenging scenarios, drawers, doors, Matryoshka dolls, fabric...
🔗
Introducing RoboCook, our new particle-based world modeling framework for dumpling making, a highly complicated long-horizon manipulation task using 15 tools.
Check out
@HaochenShi74
's detailed thread. Here, I discuss our exciting journey to date. (1/7)
Do you know how to make a dumpling🥟? Our robot🤖does!
Introducing RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools.
Project website:
Here we show how RoboCook makes a dumpling under external human perturbation. Thread🧵👇
Our new work studies a core question in robot manipulation: which **scene representation** to use? 🤖
Introducing D^3Fields: a 3D, dynamic, and semantic representation powered by foundation models. It supports a vast range of real-world manipulation tasks in a ZERO-SHOT manner!
What should the right representation for robotic manipulation be?
Enter D^3Fields: a 3D, dynamic, and semantic representation using foundation models WITHOUT training for zero-shot generalizable robotic manipulation. Colab is available!
🔗
🧵👇
Thank you
@_akhaliq
, for sharing our work recently presented at
#NeurIPS2023
!
Visit our project page for more details, demos, and to try it out on Google Colab:
Watch the full video of our robot manipulating letters to form the word "NeurIPS"! 🤖
Model-Based Control with Sparse Neural Dynamics
paper page:
Learning predictive models from observations using deep neural networks (DNNs) is a promising new approach to many real-world planning and control problems. However, common DNNs are too
In the following work led by Danny
@DannyDriess
, we explore the use of NeRF to learn compositional scene representations for model-based planning with a combination of
(1) implicit object encoders,
(2) graph-structured neural dynamics models,
(3) a latent-space RRT planner. (1/4)
Imagenet was successful because it was the benchmark for Deep Learning and Computer Vision—progress on Imagenet signified progress in CV and DL.
Embodied AI also needs such a benchmark, and B1K is a concrete milestone towards that goal. 🤖
Huge congrats to the team! 🎉
One year ago, we first introduced BEHAVIOR-1K, which we hope will be an important step towards human-centered robotics. After our year-long beta, we’re thrilled to announce its full release, which our team just presented at NVIDIA
#GTC2024
. 1/n
Great to catch up with so many familiar faces at
#CoRL2023
today! We have three Orals this year, and two are award finalists!
Nov 7, 8:30-9:30 am (Oral), 2:45-3:30 pm (Poster)
RoboCook ()
- Finalist for Best Systems Paper Award
- Led by
@HaochenShi74
Excited to share our new project, led by amazing
@wenlong_huang
, exploring large language and vision models (
#LLMs
& VLMs) for zero-shot
#Robotics
manipulation!
What's particularly interesting to me is demonstrated ability to **specify the goal** for embodied agents. 🧵👇(1/4)
How to harness foundation models for *generalization in the wild* in robot manipulation?
Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world!
🌐
🧵👇
Finally the moment. I will be joining the University of Illinois
@IllinoisCS
as a PhD student this fall! I will be working with Prof. Yunzhu Li
@YunzhuLiYZ
on exciting topics across computer vision, machine learning and robotics.
Robots writing “Hello World” in Chinese using granular pieces? 🤖👋
Accepted at
#RSS2023
, we present a dynamic-resolution model learning framework for object pile manipulation.
Adding to
@YXWangBot
's thread, I discuss the comparison with humans. (1/3)
How can a robot manipulate object piles with varied granularity and geometry? Check out our paper "Dynamic-Resolution Model Learning for Object Pile Manipulation"
#RSS2023
!
Project website:
Here is a "Hello World" example. Thread. 🧵👇
#AI
#Robotics
Attending
#RSS2022
in NYC!
Check out our work
- RoboCraft on June 28 ()
- NeRF-RL at L-DOD workshop on June 27 ()
I'm also co-organizing the implicit representation workshop on July 1 (). Come and join us!
Fun fact: this project is inspired by the following Tom & Jerry video.
Key takeaways:
- We identify "what" objects require exploration.
- We understand "how" to interact with these objects.
- We "memorize" the details of what we have seen and explored to support downstream
Introducing RoboEXP, a robotic system that explores! 🤖
When a robot enters a kitchen, it must find all the ingredients before preparing the food for you.
The exploration should not be random, and we use **foundation models** to tell the robot “what” and “how” to explore!
Check out a perspective I co-authored with
@LuoYiyue
for
@ScienceMagazine
on intelligent textiles.
Intelligent fabrics, which can sense and communicate information scalably and unobtrusively, can fundamentally change how people interact with the world.
Check out our recent work on learning unsupervised keypoints for model-based reinforcement learning! Here is a nice summary of the highlights from
@peteflorence
Can robots model the world with keypoints, and learn how to see, predict, and control them into the future?
"Keypoints into the Future: Self-Supervised Correspondence in Model-Based Reinforcement Learning"
@lucas_manuelli
,
@YunzhuLiYZ
, me,
@rtedrake
(1/n)
I am thrilled to share that my next step is pursuing a PhD at UIUC, where I will have the opportunity to collaborate with Prof. Yunzhu Li and Prof. Shenlong Wang. I am grateful for the support and guidance of my friends, family, and professors throughout the application process.
I would like to share our ICML 2020 paper on "Visual Grounding of Learned Physical Models".
w Toru Lin, Kexin Yi,
@recursus
,
@dyamins
,
@jiajunwu_cs
, Josh Tenenbaum, & Antonio Torralba
Project page:
Video:
1/8
Our scalable tactile glove introduced in a Nature 2019 paper is collected by the MIT Museum!!
Joint work with Subra (lead author), Petr, Jun-Yan
@junyanz89
, Antonio, and Wojciech.
Thanks to Yiyue
@LuoYiyue
for making a new one specifically for display!
This work is inspired by
@lucacarlone1
's amazing works on building 3D scene graphs and by
@_krishna_murthy
's fantastic ConceptGraph.
We add a critical new treatment: actions. The robot does not merely observe the environment but interacts with it to discover all hidden items.
Join me at
#ICRA2023
as I present our latest work in learning structured dynamics models for deformable object manipulation, from manipulating dough and granular objects to crafting dumplings.
Don't miss it and the wealth of knowledge from other fantastic speakers!
#Robotics
#AI
Join us this morning at
#CVPR2023
as we present the ObjectFolder Benchmark!
Our work integrates multisensory object representations, incorporating vision, touch, and sound, benchmarked around tasks like recognition, reconstruction, and robotic manipulation.
Come chat with us!
To be presented at
#CVPR2023
on Thursday morning, “The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects”
Project page:
Paper:
Demo:
Poster session: THU-AM-076
I will give a talk at the
#ICLR2021
simDL workshop tomorrow about our recent work on learning-based dynamics modeling for physical inference and control. Come and chat with us!
Welcome to our hosted ICLR 2021 Workshop Deep Learning for Simulation (simDL) ! It will be live on May 7 8:45am-5pm Pacific Daylight Time. We have 8 invited talks from leading researchers, 3 contributed talks, and poster session with 51 accepted papers.
I would also like to refer you to a related work by
@sindy_loewe
,
@david_madras
, Rich Zemel, and
@wellingmax
, which leverages a shared dynamics model and learns to infer causal graphs from time-series data using Amortized Causal Discovery:
Our work shows NeRF learns scene representations better at capturing the 3D structure of the environment, which turn out to be surprisingly useful for RL in tasks that require 3D reasoning!
Kudos to the lead authors,
@DannyDriess
and
@IngmarSchubert
!
My talk at
#ICLR2021
simDL workshop is now available on YouTube:
I discussed why we want to learn simulators from data and how different modeling choices affect (1) the generalization power and (2) their usage in physical inference and model-based control.
Specifically, we employ the technique called "Transporter" developed by
@tejasdkulkarni
et al. () as our perception module, which assigns keypoints over the foreground of the images and consistently tracks the objects over time across different frames. 4/7
Dynamic-Resolution Model Learning for Object Pile Manipulation
paper page:
Dynamics models learned from visual observations have shown to be effective in various robotic manipulation tasks. One of the key questions for learning such dynamics models is
Real-world data can vary widely across sources (e.g., Amazon warehouses, a fleet of self-driving cars, and photos taken by individual users).
Our new
#iclr2023
paper shows how decentralized self-supervised learning can be robust to such heterogeneity in decentralized datasets!
How can we do self-supervised learning on unlabeled data without sharing them? Super excited to share our work Dec-SSL
@ICLR
! We study decentralized self-supervised learning and try to understand its robustness and communication efficiency of it. Video:
The CVPR Tutorial on Graph-Structured Networks tomorrow will feature the line of works on representing learning with GNN. I will present our works on using GNN for physical inference and modeling-based control. Come and join us!
We are hosting a CVPR tutorial on Graph-structured Networks tomorrow. We will cover topics over Transformers, graph networks, and applications on 3D scene understanding, physical interaction prediction, RL and control.
Sunday, 9:00am-12:30 pm PDT
CNN featured our recent
#cvpr19
paper on cross-modal prediction between vision and touch. Joint work with
@junyanz89
, Russ Tedrake, and Antonio Torralba.
AI-generated 3D content holds immense potential to revolutionize a broad spectrum of applications. The automated creation of diverse 3D environments is crucial for training robots, serving as a key element in achieving widespread generalization. 🤖
Congratulations, Hao!! 🚀
📢Thrilled to announce sudoAI (
@sudoAI_
), founded by a group of leading AI talents and me!🚀
We are dedicated to revolutionizing digital & physical realms by crafting interactive AI-generated 3D environments!
Join our 3D Gen AI model waitlist today!
👉
Check out our
#CVPR2021
paper on building a tactile carpet for human pose estimation!
Imagine that in a workout, it can:
- recognize the activity
- count num of reps
- (potentially) calculate burned calories!
Poster @ Wed June 23, 10 PM – 12:30 AM EDT
Check out our Nature Electronics paper on Interaction Learning with Conformal Tactile Textiles! Joint work w/
@LuoYiyue
(lead author),
@pratyusha_PS
,
@showone20
,
@kui_wu
, Michael Foshey,
@lbc1245
, Tomas Palacios, Antonio Torralba, Wojciech Matusik
BREAKING: MIT "smart clothes" use special tactile fibers to sense a person’s movement & determine what pose they're in.
Potential applications:
🏀 coaching
♿ rehabilitation
👴🏽 elder care
Paper:
More: (v/
@NatureElectron
)
At
#CVPR2023
poster session this AM, we'll present our work on learning object-centric neural scattering functions for dynamic modeling of multi-object scenes, designed for robotic manipulation under extreme lighting.
Come chat with us and check
@stephentian_
's thread for more!
How can we enable robotic manipulation in multi-object scenes with potentially harsh lighting conditions?
At
#CVPR2023
, we’re presenting our recent work combining object-centric neural scattering functions and learned dynamics models to perform robotic control!
(1/6)
Thanks Jim! Like how humans sense the world, the foundation models for robots should be multimodal.
Check out the ObjectFolder Benchmark, our attempt towards a large-scale, real-world, multimodal object dataset, built for tasks like recognition, reconstruction, and manipulation.
What is a "cup"? To LLMs, it is a word. But to us, it is a full sensory package: the visual appearance, the 3D topology, the ceramic texture of the handle, the sound of it landing on a table.
To gain a far deeper understanding of concepts, the next-gen AI needs to develop
There is a huge potential for the use of implicit representations in robotics. Are you interested in learning and advancing the forefront of this direction? Please consider participating and contributing to our
#RSS2022
workshop!
Are you interested in the role of implicit representations within robotics?
Then checkout our
#RSS2022
workshop on July 1st.
We also solicit 2-3 page extended abstracts as contributions! (1/4)
@cs231n
has always been the computer vision and deep learning course I recommend to anyone interested in this area. It introduced me to the field, and I was extremely fortunate to contribute back last year. I'm sure this year will be amazing as well!
It’s that time of the year - first lecture of
@cs231n
!! It’s the 9th year since
@karpathy
and I started this journey in 2015, what an incredible decade of AI and computer vision! Am so excited to this new crop of students in CS231n! (Co-instructing with
@eadeli
this year 😍🤩)
Causal discovery is at the core of human cognition. The interactions within a physical scene causally affect the behavior of the physical system. It is desirable to understand the underlying causal structure and model the functional mechanism directly from images. 2/7
New
#CoRL22
paper on long-horizon plasticine manipulation using tools like cutter, pusher, and roller.
We made both *temporal* and *spatial* abstractions for more effective planning of the skill sequence.
Kudos to all authors, especially Xingyu
@Xingyu2017
and Carl
@carl_qi98
!!
Object-centric representations and hierarchical reasoning are key to generalization. How can we manipulate deformables, where “objectness” changes over time? Our method finds a way and solves challenging real-world dough manipulation tasks!
#CoRL2022
Our framework combines Neural Radiance Fields (NeRF) and time contrastive learning with an autoencoding framework, which learns viewpoint-invariant 3D-aware scene representations from 2D visual observations. (4/7)
The reconstructed low-level memory allows us to inspect what's inside the cabinets.
Check out our website for code and examples including drawers, doors, Matryoshka dolls, fabric, etc.
Kudos to
@jiang_hanxiao
for his fantastic job leading this project!!
Excited to co-organize the workshop on Multi-Agent Interaction and Relational Reasoning at ICCV21! We aim to enable interdisciplinary discussions from areas like multi-agent systems, visual relational reasoning, etc.
Please consider joining and sharing your work at the workshop!
For PhD applicants, please submit your application through .
Select the Computer Science PhD program for Fall 2023, and mention me as one of your Faculty of Interest.
Thanks, and I'm looking forward to your application!
The guest lecture was for
@GuanyaShi
's course on Robot Learning: .
I summarized our work over the years on "Learning Structured World Models From and For Physical Interactions":
Amazing group of students and enjoyable questions!
Impressive policy rollout on various dexterous tasks, powered by scalable, in-the-wild hand capture! Incredible engineering & learning techniques put together! Congrats to
@chenwang_j
and
@HaochenShi74
.
(What's stopping the robot from continuing to pour water into the teapot?)
Can we use wearable devices to collect robot data without actual robots?
Yes! With a pair of gloves🧤!
Introducing DexCap, a portable hand motion capture system that collects 3D data (point cloud + finger motion) for training robots with dexterous hands
Everything open-sourced
The ability to perform one-shot discovery of the causal structure allows our model to make counterfactual predictions and extrapolate to systems of unseen interaction graphs or graphs of various sizes. 6/7
Our method extracts a structured keypoint-based representation from videos, understands the causal relationships between different constituting components, identify the hidden confounding variables, and makes predictions into the future. 3/7
We have just released the code and the video for our
#NeurIPS2020
paper on "Causal Discovery in Physical Systems from Videos".
Project:
Code:
Video:
Try it out and let us know if you have any questions!
Our work takes a step forward to model complicated 3D dynamical systems purely from 2D observations for model-based planning, which we hope can inspire future studies of more generalizable vision-based manipulation systems. (7/7)
Today @
#RSS2022
,
@HaochenShi74
will present our work on learning particle dynamics for manipulating Play-Doh!
The model is learned directly from real data consisting of just **10 minutes** of random interactions. Coupled with MPC, we manipulate Play-Doh into letter-like shapes!
On Tuesday at
#RSS22
, I will present our paper RoboCraft! The presentation will be in Arledge Lerner Hall between 10:35-10:40am local time! Our poster will be at Arledge Lerner Hall between 4:30-6:00pm. Please come and checkout! (1/n)
Humans have a strong intuitive understanding of the 3D environment around us. The mental model of the physics in our brain applies to objects of different materials and enables us to perform a wide range of manipulation tasks that are beyond the reach of current robots. (3/7)
Please also check out a nice work by
@recursus
on "Learning Physical Graph Representations from
Visual Scenes" that takes a step further by removing the supervision of scene structures: . 8/8
The model does not assume access to the ground truth causal graph, but learns to discover the dependency structures and model the causal mechanisms from images in an unsupervised way, which we hope can facilitate future studies of more generalizable visual reasoning systems. 7/7
This work naturally extends my series on the use of neural fields for world modeling & robotic manipulation.
Intrigued by this direction? Explore more here:
We then extend the model developed by
@thomaskipf
et al. () to discover the causal structure between the keypoints and identify both the discrete and continuous hidden confounding variables on the directed edges. 5/7
A dynamics model, over the learned representation space, enables visuomotor control for manipulation tasks involving rigid bodies and fluids. When coupled with an auto-decoding framework, it supports goal specification from viewpoints outside the training distribution. (5/7)
Thank you for the note
@alihkw_
!! We are both inspired by and love your awesome series of work on ConceptFusion and ConceptGraphs.
We could debate on which abstraction level to set the graph, but action-conditioned scene graphs might have to be **the** way to scale things up.
Scene-graphs with actions!! I really think scene graphs are going to be a (the?) fundamental data structure for robotics going forward.
Also, so happy to see ConceptGraphs inspiring new and awesome work like this!
First paper: we combine Koopman operator theory and graph neural networks to enable efficient system identification and control synthesis for compositional systems.
Website:
Video:
(2/3)
Interested in long-horizon deformable object manipulation? Check out our
#ICLR2022
paper on this problem by combining
(1) a differentiable physics simulator for short-term skill abstraction
(2) a planner to produce intermediate goals and assemble the skills for long-horizon tasks
Robotic manipulation of deformable objects like dough requires long-horizon reasoning over the use of different tools. Our method DiffSkill utilizes a differentiable simulator to learn and compose skills for these challenging tasks.
#ICLR2022
Website:
How to harness foundation models for *generalization in the wild* in robot manipulation?
Introducing VoxPoser: use LLM+VLM to label affordances and constraints directly in 3D perceptual space for zero-shot robot manipulation in the real world!
🌐
🧵👇
Learning Causal Graphs that capture Physical Systems has high potential yet challenging!
Check out End-to-End Causal Discovery from videos
Site:
Paper:
w\
@YunzhuLiYZ
@AnimaAnandkumar
, A.Torralba, D. Fox
Specifying goals for drones and cars is simple — give them a destination.
But with household robots, it's more complex — how should a robot interpret commands like "set the table," "sort the trash," or "clean the room"? (2/4)
Nov 7, 8:30-9:30 am (Oral), 2:45-3:30 pm (Poster)
Predicting Object Interactions with Behavior Primitives: An Application in Stowing Tasks
- Project page:
- Finalist for Best Paper/Best Student Paper Awards
- Led by
@HaonanChen_
Our robot adopts a holistic approach, considering the entire pile as a whole and accounting for the overall redistribution before focusing on the detailed shape.
In contrast, humans tend to be more sequential, aligning one part of the shape before moving on to the next. (2/3)
@AvivTamar1
@AnimaAnandkumar
@animesh_garg
Thank you, Aviv! Your work's results look fantastic!
Combining DLPs and causal dynamics prediction will move the frontier forward in cases where the causal/relational mechanisms between components are not directly observable from still images. Super exciting future direction! 🙌
This project naturally extends our CoRL-21 Oral paper () by explicitly accounting for the compositionality/structure of the underlying system, which we show allows much better generalization outside the training distribution. (3/4)
Introducing “3D Neural Scene Representations for Visuomotor Control”!
(w/ video!)
We combine implicit neural scene representations with intuitive physics models, enabling visuomotor control of dynamic 3D scenes from out-of-distribution viewpoints. (1/7)
Second paper: we introduce a diagnostic video dataset for temporal/causal reasoning, and provide a method that joins the ability to recognize objects and model the dynamics and causal relations via a symbolic video representation.
Website:
(3/3)
We have just released the code for our ICML-20 paper on Visual Grounding of Learned Physical Models
together with a stand-alone repo for dynamics prediction.
We have two papers on learning and reasoning about dynamical systems accepted to
#ICLR2020
as spotlight presentations!
Come and join the live sessions on Wednesday (April 29th, 13:00-15:00 EDT, and 16:00-18:00 EDT)!
(1/3)
We are excited to release our work on DexPilot, a markerless, glove-free and vision based teleoperation of dexterous robot hand-arm system
pdf is here
link to more videos
We tackle this challenge head-on.
By harnessing the commonsense knowledge acquired by the LLMs, we sidestep the need for manually specifying cost functions for each task.
Our method generates objectives automatically, demonstrating impressive zero-shot generalization. (3/4)
More examples of our robot writing “Hello” in Japanese!
Kudos to
@YXWangBot
for showcasing the power of our particle-based graph dynamics model --- a single model, trained solely in simulation, accomplishes all demonstrated tasks (e.g., gather, redistribute, sort). (3/3)
Nov 8, 11:00-12:00 pm (Oral), 5:15-6:00 pm (Poster)
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
- Project page:
- Led by
@wenlong_huang
Please consider submitting a poster! We are bringing together simulation researchers that develop deformable-object simulators, with roboticists that leverage these simulators for real-world robotics applications.
We're organizing an RSS workshop on deformable object simulation + manipulation. If you're working in this area please consider submitting a poster, with the chance to win an NVIDIA GPU! Abstract due on 20th of June, see website for more details:
Our recent work on Learning Particle Dynamics for Robot Manipulation () is featured by MIT News (). Also, check out our initial attempt on extending to partially observable scenarios ().
@jiajunwu_cs
,
@junyanz89
This thread shows the value of combining particle representation and GNNs for (1) dynamics modeling of diverse objects, and (2) application in long-horizon tasks requiring extensive tool use.
Kudos to all my collaborators, especially
@HaochenShi74
for his phenomenal work!! (7/7)
Particles as the scene representation are both general and flexible. My research into this area began five years ago with the development of DPI-Nets, built using graph neural networks (GNNs) to simulate rigid bodies, deformable objects, and fluids: . (2/7)