Siheng Zhao
@SihengZhao
Followers
658
Following
810
Media
16
Statuses
88
CS PhD student @USC | intern @Amazon FAR (Frontier AI & Robotics) | Amazon PhD Fellow
Los Angeles, CA
Joined October 2020
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
10
73
297
🚀 The real-to-sim code from our CoRL 2025 paper, RoLA, is now open-sourced at https://t.co/VWt2bZhNr5!
(1/n) Ever wondered if a single in-the-wild image could generate photorealistic robotic demonstrations? 🖼️ 🔥Excited to share our #CoRL2025 paper, Robot Learning from Any Images (RoLA), a framework that transforms any in-the-wild image into an interactive, physics-enabled
1
9
63
We release OpenReal2Sim, an open-source toolbox for real-to-sim reconstruction and robot simulation. A key difference from prior work is our focus on building an interactive digital twin from in-the-wild data — even Internet images or generated videos. Try it out: Interactive
2
35
177
Checkout our new paper on whole-body, mocap-free humanoid teleportation system to scale up data collection!
Excited to introduce TWIST2, our next-generation humanoid data collection system. TWIST2 is portable (use anywhere, no MoCap), scalable (100+ demos in 15 mins), and holistic (unlock major whole-body human skills). Fully open-sourced: https://t.co/fAlyD77DEt
0
1
23
Excited to introduce TWIST2, our next-generation humanoid data collection system. TWIST2 is portable (use anywhere, no MoCap), scalable (100+ demos in 15 mins), and holistic (unlock major whole-body human skills). Fully open-sourced: https://t.co/fAlyD77DEt
21
108
459
ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning https://t.co/76kcynpThj GMT + residual policy for stable interactive loco-mani wbc
0
11
47
Amazon’s training humanoids to carry boxes. I’m racking my brain over why they’d do that. ResMimic enables precise, expressive humanoid loco-manipulation, bridging gaps in general motion tracking (GMT) policies, which lack object awareness. Amazon FAR led the work with a
48
89
516
Check out Siheng’s Amazon internship project! While full-body motion generation has made great progress, whole-body manipulation remains challenging because it requires coordinated robot–object interaction. Our approach tackles this through a two-stage framework: a general
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
4
7
54
We finally made it on Aerial Manipulation! Lightbulb mounting, fruit pick-and-place, and peg-in-hole, all by the visuomotor policies trained on only UMI data. Back in the day, when we successfully teleoped those tasks in the Flying-Hand paper, we knew we would turn it into the
✈️🤖 What if an embodiment-agnostic visuomotor policy could adapt to diverse robot embodiments at inference with no fine-tuning? Introducing UMI-on-Air, a framework that brings embodiment-aware guidance to diffusion policies for precise, contact-rich aerial manipulation.
0
8
48
Robot Learning from Any Images https://t.co/kvK1z9gOgV Generating physical robotic environments from images
0
1
11
ResMimic: learns a whole-body loco-manipulation policy on top of general motion tracking a policy Key ideas: (i) pre-train general motion tracking (ii) post-train task-specific residual policy with: (a) object tracking reward (b) contact reward (c) virtual object force
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
5
27
206
People are getting so good at manipulating things with these awful hard-cast plastic hands. Really impressive stuff.
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
6
4
97
A problem of general motion trackers is they can not do (forceful) manipulation, such as lifting a heavy chair. This is natural because they are not trained with objects. In ResMimic, we introduce a pretraining-post training paradigm. Just finetuning motion trackers with a
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
1
7
81
8/ 🙌 Acknowledgement: This was an exciting project that I worked on during my internship at Amazon FAR, with amazing collaboration from @ZeYanjie, and insightful advice from @yuewang314, C. Karen Liu, @pabbeel, @GuanyaShi, and @rocky_duan.
0
0
11
7/ Related and interesting work: There is a lot of exciting work recently in humanoid loco-manipulation, general humanoid-object interaction. - HDMI ( https://t.co/UoRoRjQ0rD) by @ElijahGalahad from @LeCARLab, learns general humanoid-object interaction from human videos. -
1
0
11
6/ Ablation on Virtual Force Curriculum: The virtual object controller improves training stability by applying curriculum-based virtual forces that guide the object along its reference trajectory. Reference motions often include imperfections (e.g., hand–object penetrations),
1
0
9
5/ Ablation on Contact Tracking Reward: The contact reward guides the policy to adopt whole-body strategies. Without it, the humanoid depends mainly on wrist and hand motions, which may work in IsaacGym but fail to generalize to MuJoCo and the real world. With it, coordinated
1
0
11
4/ Real-world Comparison: We evaluate ResMimic against several baselines in real-world settings: - Pre-trained base policy only: mimics human motion superficially but has not been trained for effective object interaction. - Training from scratch: fails due to poor sim-to-real
1
0
11
3/ 🏋️♂️ Heavy Payload Mastery: Although the Unitree G1’s wrist payload limit is ~2.5 kg, ResMimic can handle up to 5.5 kg objects with stable whole-body coordination.
1
0
13
2/ 🌐 Project Website: https://t.co/ezbWj6k1Qc The full pipeline consists of three stages: (i) Pretrain a general motion tracking (GMT) policy on large-scale human motion data. While ResMimic is a general framework that can incorporate any GMT policy as the base policy, we
1
4
20