SihengZhao Profile Banner
Siheng Zhao Profile
Siheng Zhao

@SihengZhao

Followers
658
Following
810
Media
16
Statuses
88

CS PhD student @USC | intern @Amazon FAR (Frontier AI & Robotics) | Amazon PhD Fellow

Los Angeles, CA
Joined October 2020
Don't wanna be here? Send us removal request.
@SihengZhao
Siheng Zhao
2 months
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
10
73
297
@SihengZhao
Siheng Zhao
7 days
🚀 The real-to-sim code from our CoRL 2025 paper, RoLA, is now open-sourced at https://t.co/VWt2bZhNr5!
@SihengZhao
Siheng Zhao
2 months
(1/n) Ever wondered if a single in-the-wild image could generate photorealistic robotic demonstrations? 🖼️ 🔥Excited to share our #CoRL2025 paper, Robot Learning from Any Images (RoLA), a framework that transforms any in-the-wild image into an interactive, physics-enabled
1
9
63
@PointsCoder
Jiageng Mao
8 days
We release OpenReal2Sim, an open-source toolbox for real-to-sim reconstruction and robot simulation. A key difference from prior work is our focus on building an interactive digital twin from in-the-wild data — even Internet images or generated videos. Try it out: Interactive
2
35
177
@SihengZhao
Siheng Zhao
18 days
Checkout our new paper on whole-body, mocap-free humanoid teleportation system to scale up data collection!
@ZeYanjie
Yanjie Ze
18 days
Excited to introduce TWIST2, our next-generation humanoid data collection system. TWIST2 is portable (use anywhere, no MoCap), scalable (100+ demos in 15 mins), and holistic (unlock major whole-body human skills). Fully open-sourced: https://t.co/fAlyD77DEt
0
1
23
@ZeYanjie
Yanjie Ze
18 days
Excited to introduce TWIST2, our next-generation humanoid data collection system. TWIST2 is portable (use anywhere, no MoCap), scalable (100+ demos in 15 mins), and holistic (unlock major whole-body human skills). Fully open-sourced: https://t.co/fAlyD77DEt
21
108
459
@RoboReading
C's Robotics Paper Notes
30 days
ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning https://t.co/76kcynpThj GMT + residual policy for stable interactive loco-mani wbc
0
11
47
@TheHumanoidHub
The Humanoid Hub
2 months
Amazon’s training humanoids to carry boxes. I’m racking my brain over why they’d do that. ResMimic enables precise, expressive humanoid loco-manipulation, bridging gaps in general motion tracking (GMT) policies, which lack object awareness. Amazon FAR led the work with a
48
89
516
@yuewang314
Yue Wang
2 months
Check out Siheng’s Amazon internship project! While full-body motion generation has made great progress, whole-body manipulation remains challenging because it requires coordinated robot–object interaction. Our approach tackles this through a two-stage framework: a general
@SihengZhao
Siheng Zhao
2 months
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
4
7
54
@Xiaofeng2Guo
Xiaofeng Guo
2 months
We finally made it on Aerial Manipulation! Lightbulb mounting, fruit pick-and-place, and peg-in-hole, all by the visuomotor policies trained on only UMI data. Back in the day, when we successfully teleoped those tasks in the Flying-Hand paper, we knew we would turn it into the
@hgupt3
Harsh Gupta
2 months
✈️🤖 What if an embodiment-agnostic visuomotor policy could adapt to diverse robot embodiments at inference with no fine-tuning? Introducing UMI-on-Air, a framework that brings embodiment-aware guidance to diffusion policies for precise, contact-rich aerial manipulation.
0
8
48
@RoboReading
C's Robotics Paper Notes
2 months
Robot Learning from Any Images https://t.co/kvK1z9gOgV Generating physical robotic environments from images
0
1
11
@pabbeel
Pieter Abbeel
2 months
ResMimic: learns a whole-body loco-manipulation policy on top of general motion tracking a policy Key ideas: (i) pre-train general motion tracking (ii) post-train task-specific residual policy with: (a) object tracking reward (b) contact reward (c) virtual object force
@SihengZhao
Siheng Zhao
2 months
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
5
27
206
@chris_j_paxton
Chris Paxton
2 months
People are getting so good at manipulating things with these awful hard-cast plastic hands. Really impressive stuff.
@SihengZhao
Siheng Zhao
2 months
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
6
4
97
@ZeYanjie
Yanjie Ze
2 months
A problem of general motion trackers is they can not do (forceful) manipulation, such as lifting a heavy chair. This is natural because they are not trained with objects. In ResMimic, we introduce a pretraining-post training paradigm. Just finetuning motion trackers with a
@SihengZhao
Siheng Zhao
2 months
ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.
1
7
81
@SihengZhao
Siheng Zhao
2 months
8/ 🙌 Acknowledgement: This was an exciting project that I worked on during my internship at Amazon FAR, with amazing collaboration from @ZeYanjie, and insightful advice from @yuewang314, C. Karen Liu, @pabbeel, @GuanyaShi, and @rocky_duan.
0
0
11
@SihengZhao
Siheng Zhao
2 months
7/ Related and interesting work: There is a lot of exciting work recently in humanoid loco-manipulation, general humanoid-object interaction. - HDMI ( https://t.co/UoRoRjQ0rD) by @ElijahGalahad from @LeCARLab, learns general humanoid-object interaction from human videos. -
1
0
11
@SihengZhao
Siheng Zhao
2 months
6/ Ablation on Virtual Force Curriculum: The virtual object controller improves training stability by applying curriculum-based virtual forces that guide the object along its reference trajectory. Reference motions often include imperfections (e.g., hand–object penetrations),
1
0
9
@SihengZhao
Siheng Zhao
2 months
5/ Ablation on Contact Tracking Reward: The contact reward guides the policy to adopt whole-body strategies. Without it, the humanoid depends mainly on wrist and hand motions, which may work in IsaacGym but fail to generalize to MuJoCo and the real world. With it, coordinated
1
0
11
@SihengZhao
Siheng Zhao
2 months
4/ Real-world Comparison: We evaluate ResMimic against several baselines in real-world settings: - Pre-trained base policy only: mimics human motion superficially but has not been trained for effective object interaction. - Training from scratch: fails due to poor sim-to-real
1
0
11
@SihengZhao
Siheng Zhao
2 months
3/ 🏋️‍♂️ Heavy Payload Mastery: Although the Unitree G1’s wrist payload limit is ~2.5 kg, ResMimic can handle up to 5.5 kg objects with stable whole-body coordination.
1
0
13
@SihengZhao
Siheng Zhao
2 months
2/ 🌐 Project Website: https://t.co/ezbWj6k1Qc The full pipeline consists of three stages: (i) Pretrain a general motion tracking (GMT) policy on large-scale human motion data. While ResMimic is a general framework that can incorporate any GMT policy as the base policy, we
1
4
20