Siheng Zhao @SihengZhao X Profile

Siheng Zhao

@SihengZhao

Followers

658

Following

810

Media

16

Statuses

88

CS PhD student @USC | intern @Amazon FAR (Frontier AI & Robotics) | Amazon PhD Fellow

https://t.co/A4lDjV7im0

Los Angeles, CA

Joined October 2020

Don't wanna be here? Send us removal request.

Siheng Zhao

@SihengZhao

2 months

ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.

10

73

297

Siheng Zhao

@SihengZhao

7 days

🚀 The real-to-sim code from our CoRL 2025 paper, RoLA, is now open-sourced at https://t.co/VWt2bZhNr5!

Siheng Zhao

@SihengZhao

2 months

(1/n) Ever wondered if a single in-the-wild image could generate photorealistic robotic demonstrations? 🖼️ 🔥Excited to share our #CoRL2025 paper, Robot Learning from Any Images (RoLA), a framework that transforms any in-the-wild image into an interactive, physics-enabled

1

9

63

Jiageng Mao

@PointsCoder

8 days

We release OpenReal2Sim, an open-source toolbox for real-to-sim reconstruction and robot simulation. A key difference from prior work is our focus on building an interactive digital twin from in-the-wild data — even Internet images or generated videos. Try it out: Interactive

2

35

177

Siheng Zhao

@SihengZhao

18 days

Checkout our new paper on whole-body, mocap-free humanoid teleportation system to scale up data collection!

Yanjie Ze

@ZeYanjie

18 days

Excited to introduce TWIST2, our next-generation humanoid data collection system. TWIST2 is portable (use anywhere, no MoCap), scalable (100+ demos in 15 mins), and holistic (unlock major whole-body human skills). Fully open-sourced: https://t.co/fAlyD77DEt

0

1

23

Yanjie Ze

@ZeYanjie

18 days

Excited to introduce TWIST2, our next-generation humanoid data collection system. TWIST2 is portable (use anywhere, no MoCap), scalable (100+ demos in 15 mins), and holistic (unlock major whole-body human skills). Fully open-sourced: https://t.co/fAlyD77DEt

21

108

459

C's Robotics Paper Notes

@RoboReading

30 days

ResMimic: From General Motion Tracking to Humanoid Whole-body Loco-Manipulation via Residual Learning https://t.co/76kcynpThj GMT + residual policy for stable interactive loco-mani wbc

0

11

47

The Humanoid Hub

@TheHumanoidHub

2 months

Amazon’s training humanoids to carry boxes. I’m racking my brain over why they’d do that. ResMimic enables precise, expressive humanoid loco-manipulation, bridging gaps in general motion tracking (GMT) policies, which lack object awareness. Amazon FAR led the work with a

48

89

516

Yue Wang

@yuewang314

2 months

Check out Siheng’s Amazon internship project! While full-body motion generation has made great progress, whole-body manipulation remains challenging because it requires coordinated robot–object interaction. Our approach tackles this through a two-stage framework: a general

Siheng Zhao

@SihengZhao

2 months

ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.

4

7

54

Xiaofeng Guo

@Xiaofeng2Guo

2 months

We finally made it on Aerial Manipulation! Lightbulb mounting, fruit pick-and-place, and peg-in-hole, all by the visuomotor policies trained on only UMI data. Back in the day, when we successfully teleoped those tasks in the Flying-Hand paper, we knew we would turn it into the

Harsh Gupta

@hgupt3

2 months

✈️🤖 What if an embodiment-agnostic visuomotor policy could adapt to diverse robot embodiments at inference with no fine-tuning? Introducing UMI-on-Air, a framework that brings embodiment-aware guidance to diffusion policies for precise, contact-rich aerial manipulation.

0

8

48

C's Robotics Paper Notes

@RoboReading

2 months

Robot Learning from Any Images https://t.co/kvK1z9gOgV Generating physical robotic environments from images

0

1

11

Pieter Abbeel

@pabbeel

2 months

ResMimic: learns a whole-body loco-manipulation policy on top of general motion tracking a policy Key ideas: (i) pre-train general motion tracking (ii) post-train task-specific residual policy with: (a) object tracking reward (b) contact reward (c) virtual object force

Siheng Zhao

@SihengZhao

2 months

ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.

5

27

206

Chris Paxton

@chris_j_paxton

2 months

People are getting so good at manipulating things with these awful hard-cast plastic hands. Really impressive stuff.

Siheng Zhao

@SihengZhao

2 months

ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.

6

4

97

Yanjie Ze

@ZeYanjie

2 months

A problem of general motion trackers is they can not do (forceful) manipulation, such as lifting a heavy chair. This is natural because they are not trained with objects. In ResMimic, we introduce a pretraining-post training paradigm. Just finetuning motion trackers with a

Siheng Zhao

@SihengZhao

2 months

ResMimic: a two-stage residual framework that unleashes the power of pre-trained general motion tracking policy. Enable expressive whole-body loco-manipulation with payloads up to 5.5kg without task-specific design, generalize across poses, and exhibit reactive behavior.

1

7

81

Siheng Zhao

@SihengZhao

2 months

8/ 🙌 Acknowledgement: This was an exciting project that I worked on during my internship at Amazon FAR, with amazing collaboration from @ZeYanjie, and insightful advice from @yuewang314, C. Karen Liu, @pabbeel, @GuanyaShi, and @rocky_duan.

0

11

Siheng Zhao

@SihengZhao

2 months

7/ Related and interesting work: There is a lot of exciting work recently in humanoid loco-manipulation, general humanoid-object interaction. - HDMI ( https://t.co/UoRoRjQ0rD) by @ElijahGalahad from @LeCARLab, learns general humanoid-object interaction from human videos. -

1

0

11

Siheng Zhao

@SihengZhao

2 months

6/ Ablation on Virtual Force Curriculum: The virtual object controller improves training stability by applying curriculum-based virtual forces that guide the object along its reference trajectory. Reference motions often include imperfections (e.g., hand–object penetrations),

1

0

9

Siheng Zhao

@SihengZhao

2 months

5/ Ablation on Contact Tracking Reward: The contact reward guides the policy to adopt whole-body strategies. Without it, the humanoid depends mainly on wrist and hand motions, which may work in IsaacGym but fail to generalize to MuJoCo and the real world. With it, coordinated

1

0

11

Siheng Zhao

@SihengZhao

2 months

4/ Real-world Comparison: We evaluate ResMimic against several baselines in real-world settings: - Pre-trained base policy only: mimics human motion superficially but has not been trained for effective object interaction. - Training from scratch: fails due to poor sim-to-real

1

0

11

Siheng Zhao

@SihengZhao

2 months

3/ 🏋️‍♂️ Heavy Payload Mastery: Although the Unitree G1’s wrist payload limit is ~2.5 kg, ResMimic can handle up to 5.5 kg objects with stable whole-body coordination.

1

0

13

Siheng Zhao

@SihengZhao

2 months

2/ 🌐 Project Website: https://t.co/ezbWj6k1Qc The full pipeline consists of three stages: (i) Pretrain a general motion tracking (GMT) policy on large-scale human motion data. While ResMimic is a general framework that can incorporate any GMT policy as the base policy, we

1

4

20