#ICLR2024
Spotlight🌟🌟🌟
PULSE: Physics-based Universal humanoid motion Latent SpacE/Representation
Code:
Site:
Paper:
As a motion representation, PULSE is low dimensional (32), high coverage (99.8% of
Now that PHC's code is out...
Introducing PULSE: Physics-based Universal motion Latent SpacE:
📜:
🌐:
All downstream tasks here use the same pretrained latent space. (1/6)
You can now ask your simulated humanoid to perform actions, in REAL-TIME 👇🏻
Powered by the amazing EMDM (
@frankzydou
,
@Alex_wangjingbo
, etal) and PHC.
EMDM:
PHC:
Simulation: Isaac Gym
🤔 Ever wondered if simulation-based animation/avatar learnings can be applied to real humanoid in real-time?
🤖 Introducing H2O (Human2HumanOid):
- 🧠 An RL-based human-to-humanoid real-time whole-body teleoperation framework
- 💃 Scalable retargeting and training using large
Now that PHC's code is out...
Introducing PULSE: Physics-based Universal motion Latent SpacE:
📜:
🌐:
All downstream tasks here use the same pretrained latent space. (1/6)
PHC has been accepted by ICCV 2023!
We aim to develop a physics-based humanoid controller capable of imitating ALL of the motion from the AMASS dataset (almost there🧐), recover from failure state, does NOT use any external forces, all the while supporting real-time use cases!
Simulated humanoid now learns how to handle a basketball🏀🏀🏀!
New work, PhysHOI, led by
@NliGjvJbycSeD6t
, learns dynamic human objects (basketballs, grabbing, etc).
Site🌐:
Paper📄:
Code/Data🧑🏻💻: (coming
Releasing the Universal Humanoid Controller (UHC) that has been the backbone of many of our physics human pose estimation efforts!
Helped by Residual Force Control, UHC can imitate up to 97% of the AMASS dataset using one policy!
Porting the Perpetual Humanoid Controller (PHC) to MuJoCo for motion imitation, 70% done.
I replicated the state space of PHC in isaac to MuJoCo, right now it has reached ~98% success rate for single network (no PNN yet).
Code (work in progress) :
One thing that amazes me about PULSE is that random samples from the latent space can lead to natural motion that is better than anything I have trained for (e.g. a walking task in PACER), even if I uses PULSE's latent space or AMP.
So what is the problem? The task reward?
In preparation for PULSE's code release:
Releasing PHC+, motion imitation model that has learned ALL of the training data (11313 AMASS sequences).
Available at PHC's codebase 👇🏻
Webcam demo for PHC is now live. Check it out!
Testing on a 3080. Different GPU might result in different results 😃
👇🏻 is a screen recording of real-time test.
Code for PHC has been released!!!
The codebase include:
- the SMPL humanoid environment for Isaac Gym
- Motion imitation models trained on AMASS
- More to come demos based on language & video input
Language to humanoid control demo, powered by MDM and PHC is out 👀👀👀
Check it out at PHC's repo:
(EMDM support once the official code comes out). Simulation runs in real-time, and this demo is as fast as MDM can generate the motion
Also added a way
Introducing SMPL_Sim, a codebase meant to be a **minimal** example for setting up SMPL humanoid in MuJoCo and Isaac Gym, that can be pip installed:
It now supports three simple tasks (reach, speed, and getup), work in progress.
Tired of designing task rewards?
New work, PhysHOI, enable simulated humanoid to learn diverse basketball skills 🏀 purely from human demonstrations.
Code available now!
Site:
Paper:
Code/Data:
Merry ChristmasChristmas!!!🎄
To boost your holiday spirit, here is me (trying to) dancing while controlling a simulated humanoid with Quest 2 as input device (SLAM cameras and headset tracking).
So I was trying to cite ReLU and this is the first thing that pops up: , with 3000+ citations 😂
Folks this is NOT the ReLU paper!
A number of popular, well-known paper has made this mistake it seems...
(There is even this website "how to cite relu":
Early footage of motion imitation for the H1 humanoid developed in the PHC codebae.
Spoiler: these motion does not translate to real (as of today; not sure if ever).
While I am at it, also releasing the trained models for VR controllers tracking using PHC.
This task is a essentially a generalized version of motion imitation, where there are only 3 6DOF points to track (red dots) instead of 24.
Models are released at
Come and check out Trace & Pace at Poster 134 this afternoon at
#CVPR
!
This work aims to create physically realistic pedestrian trajectories and animation:
@davrempe
will be there to answer all your questions! (I can't be there due to visa reasons🫠)
Happy New Year 🥳🥳🥳
Here I am, back to one of the most important step in building models, making sure data is clean😂
👇🏻a kinematic MoCap sequence playback.
Please let me know if you have better ways to find them, right now I have some crude velocity filtering and visually
@DrJimFan
Gaming leads to GPU which then leads to AI boom in 2012 and now providing virtual words for AIs to learn.
Damn gaming really is one hell of an AI accelerator
Doing some code release over the weekend for Embodied Pose:
Added an in-the-wild demo as well. Works decently well, surprisingly. Capturing the global motion relatively well and missing some details (considering it's only trained on synthetic 2D key
Have I mentioned that PHC can support multi-person interactions? Thanks to Isaac's parallel simulation, PHC can control multiple interacting humanoids out of the box.
This nice fencing sequences is from
@Me_Rawal
's EgoHumans dataset:
Recording the CVPR video knowing I might not be able to go since haven’t heard back from my Canada visa application (with booked hotels and flights🥲)
@CVPR
🙏🙏🙏
Excited to share that our paper "Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation" has been accepted to Neurips 2021!!! Thanks to the team
@RHachiuma
@KhrylxYe
@kkitani
#NeurIPS2021
Code and data will be released at:
Really hope I am at ICLR rn 🫠 (sigh in visa hell)
PULSE will be posted at Halle B
#160
, Wed 8 May 4:30 p.m. CEST — 6:30 p.m. as Spotlight Poster🌟
You can also check out the poster here 👇
PULSE has been immensely useful for many tasks we are working on🤫, stay tuned for more
Excited to announce that our work “Trace & Pace” will be presented
@CVPR2023
.
We combine guided trajectory diffusion with a physics-based humanoid controller to enable pedestrian animation that is controllable by a user.
Project page:
1/5
Presenting physics-based 3D human pose and human-object interaction estimation at
#NeurIPS2021
(D3 vision-1) now (11:30 AM - 1:00 PM ET)! Paper and code at:
I will be presenting our new work: “Embodied Scene-aware Human Pose Estimation” at
#NeurIPS2022
Thursday Poster 900.
In this work, we use third person video🎥, proprioception🕺, and scene information🪑 to drive an embodied agent for pose estimation. 1/5
Crazy results🔥
On the other hand, all the mocap cameras and markers keep reminding me that it’s not yet possible to get this to work onboard with egocentric vision & sensors 😢
Long way to go💪
Soccer players have to master a range of dynamic skills, from turning and kicking to chasing a ball. How could robots do the same? ⚽
We trained our AI agents to demonstrate a range of agile behaviors using reinforcement learning.
Here’s how. 🧵
PHC will be presented at
#ICCV2023
📍Location: Room "Foyer Sud" - 101
⏰ Thursday 5th 10:30 AM-12:30 PM
🆔: 1900
Humanoid control for avatars, motion imitation on AMASS, and fall-state recovery during imitation.
@jinkuncao
will be there to present🎉🎉🎉 (no visa for me🥲)
Let’s think about humanoid robots outside carrying the box. How about having the humanoid come out the door, interact with humans, and even dance?
Introducing Expressive Whole-Body Control for Humanoid Robots:
See how our robot performs rich, diverse,
@HarryXu12
The SMPL/SMPLx humanoid would fit perfectly. With the codebase I wish to simplify the process to work with them + hands!
Using the smplx humanoid would give access to lots of existing mocap with fingers!
Works in both Isaac Gym and MuJoCo
Found this video from 2021: trying to train a policy to enact the sequences from the GRAB dataset with a humanoid.
Fair to say things unraveled pretty quickly 👀
While working on PULSE, I found out that you CAN train a single MLP to reach very very high imitation success rate on AMASS with the right training procedure.
Basically, no MCP/MOE is needed as long as you train it for long enough...
Introducing “Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation”!
From just a front-facing video, we control a simulated character to recover physically plausible global pose and human-object interaction: (1/3)
Our work, 3D Human Motion Estimation via Motion Compression and Refinement, has been accepted to ACCV 2020 (Oral)! We focus on extracting stable and natural-looking human motion: Check out our demo: (1/2)
MeTRAbs and it’s follow up is *the* best pose/keypoint estimator I have used; completely blew my mind, and it’s in real time. Recommending it constantly to ppl these days.
(Just hope it can one day be in PyTorch🥲)
En route to
#WACV2023
!
I'll present a paper on extreme multi-dataset learning of 3D human pose estimation when labels have different skeleton formats.
Paper:
Project page:
Happy 2023! Trying to get started with shrinking down my enormous paper reading list and improving some tooling:
The new stage manager on iPadOS is fantastic for reading & note-taking, allowing apps to reside upper and lower 2/3 of the screen and quickly switch in between!
Unitree Introducing | Unitree G1 Humanoid Agent | AI Avatar
Price from $16K 🤩
Unlock unlimited sports potential(Extra large joint movement angle, 23~34 joints)
Force control of dexterous hands, manipulation of all things
Imitation & reinforcement learning driven
#Unitree
#AI
Introducing MuJoCo 3.0: a major new release of our fast, powerful and open source tool for robotics research. 🤖
📈 GPU & TPU acceleration through
#JAX
🖼️ Better simulation of more diverse objects - like clothes, screws, gears and donuts
💡 Find out more:
Introducing Sora, our text-to-video model.
Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions.
Prompt: “Beautiful, snowy
Another day another chance to recommend Paperpile
@paperpile
as the best paper reading and management tool🤩 its web importer on chrome and iOS are especially wicked. iPad app is also fantastic.
I have wanted to achieve this effect for a long time; now that Diffusion model has elevated to 0.1s level speed 🙀 and the humanoid controller is robust enough to deal with imperfect transitions and noisy input, it has finally become possible.
The full release of Ego-Exo4D is out! 1.3k hours of first and third-person videos + the world's largest source of egocentric body/hand pose estimates and video segmentation masks. Gaze, trajectories, and point clouds in 99% of data. Access below 👇
Introducing Open-World Mobile Manipulation 🦾🌍
– A full-stack approach for operating articulated objects in open-ended unstructured environments:
Unlocking doors with lever handles/ round knobs/ spring-loaded hinges 🔓🚪
Opening cabinets, drawers, and refrigerators 🗄️
👇
So…… where can I buy a comfortable strap for Vision Pro?
Been waiting for this for a long time (was an intern at Apple in 2018) and would really love to use this for work.
….. but my experience is basically:
At first: wow this is so cool so useful
5 mins later: get this
Trying the ChatGPT Siri shortcut from
Already pretty good experience even if it can only answer in text (no iOS integration), especially when driving!!
When I asked about Joel and Ellie in
#LastOfUsHBO
, the text to speech could
Professor Mintz was my academic advisor at
@Penn
and each semester I schedule a session to receive some "tough love" -- unfiltered evaluation of my academic decisions.
His advice and wisdom still ring true in my ears--"Learn more Math Zen! More Math!"
RIP Professor Mintz.
Prof. Mintz gave the very first lecture I ever attended at
@Penn
.
He tried teaching quantum computing to incoming freshmen — during orientation! Needless to say many pieces of chalk were broken that day. Couldn’t have asked for a better intro to Penn. RIP to a great professor
Bonus question: could this lead to a foundational model for Humanoid Control?
PULSE can randomly generate motion from noise and trained to perform different tasks using a sampler. So.....?
@xbpeng4
Can do! Couldn't really find a spinkick like the one in DeepMimic; how about a "spin and kick" plus some cartwheeling?
you can see the residual force working extra hard on these haha
🤔 Ever wondered if simulation-based animation/avatar learnings can be applied to real humanoid in real-time?
🤖 Introducing H2O (Human2HumanOid):
- 🧠 An RL-based human-to-humanoid real-time whole-body teleoperation framework
- 💃 Scalable retargeting and training using large
Using TRAjectory Diffusion Model for Controllable PEdestrians (TRACE) by
@davrempe
as the trajectory planner, we enable a large crowd simulation framework.
@soumithchintala
A very simple one I made
It streams the VIO + video result through webscoket and mjpeg to a host machine server (made in python + aiohttp). I was able to use it to collect (sort of) Head-mounted AR data for an egocentric pose estimation project. Like this:
@DrJimFan
I somehow would still prefer the glass form-factor; would seem more natural to interact with and not an additional device to carry around. A glass with similar UX, even the same laser display on hand would be pretty cool.
PHC+ bumps up the model size, refined the hard negative mining process (most important part), and trained for a LONG time.
The released model are a little imperfect (e.g. the rotation + keypoint model doesn't have the full walk back naturally behavior) due to time constraint
3. The motor skills from the latent space should be able to extrapolate to unseen scenarios.
Here we show policy trained using PULSE can handle complex terrain traversal using human-like behavior, using only trajectory following reward (no additional adversarial reward) (4/6)
The key features of PULSE:
1. Once the latent space is learned, randomly sampled latents create stable and human-like behavior (instead of random jitters) -- better downstream exploration.
Here we visualize training for a "reach" and "move forward with speed" tasks. (2/6)
PhD application season yet again... I wrote a SoP how-to-guide / template for anyone who doesn't know where to start. I'm continuously adding to it new example quotes from the SoPs I review. hope it helps!
Career update: I am co-founding a new research group called "GEAR" at NVIDIA, with my long-time friend and collaborator Prof.
@yukez
. GEAR stands for Generalist Embodied Agent Research.
We believe in a future where every machine that moves will be autonomous, and robots and