Yi Ru (Helen) Wang @YiruHelenWang X Profile

Yi Ru (Helen) Wang

@YiruHelenWang

Followers

466

Following

246

Media

28

Statuses

67

Ph.D. Student @uwcse @uw_robotics | Chair @ieee_ras_sac | NSERC-PGSD Fellow @nserc_crsng | BASc from @UofT EngSci | Robot learning & perception

Joined October 2012

Don't wanna be here? Send us removal request.

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

🚨Tired of binary pass/fail metrics that miss the bigger picture?. 🤖Introducing #RoboEval — an open benchmark that shows *how* robot manipulation policies behave and *why* they fail, not just *if* they succeed. 🧵1/n.🔗 📄

6

35

190

Yi Ru (Helen) Wang

@YiruHelenWang

30 days

RT @Jianfei_AI: In this generation of embodied intelligence, success isn't just about whether the task is completed. We care how it's done:….

0

2

0

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

RT @JieWang_ZJUI: I love the idea of 'Behavior Metric'!.Like we did in #RoboArena, we are trying to record more fine-grained, general and u….

0

1

0

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

RT @DJiafei: How do we accelerate progress in robotics?. As @JitendraMalikCV has noted, computer vision leapt ahead once robust, openly-sha….

0

2

0

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

RT @DJiafei: In robotics, task success/failure has long been the default metric. But is binary success enough to capture a policy’s true ca….

0

10

0

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

RT @KyleMorgenstein: RL for robotics has struggled to take advantage of compute scale for hyper param sweeps because we don’t have stable r….

0

5

0

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

RT @OfirOzeri: One of the most impotent indications to the converges coming into robotics is more and more benchmarking coming in hot. They….

0

1

0

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

RT @_tonytao_: Having reliable signals to compare different robot policies is sooo important. The more informative the metrics, the better….

0

1

0

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

👏Thanks to all my collaborators and advisors @CarterUng35146, Grant Tannert, @DJiafei, Josephine Li, Amy Le, Rishabh Oswal, Markus Grotz, @wpumacay7567, @nild0000, @RanjayKrishna, @fox_dieter17849, @siddhss5 . Y'all rock! 🎉.

0

6

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

✨TL;DR:.RoboEval shifts evaluation from "did it succeed?" to "how did it perform?". We hope this unlocks deeper insights into robotic policy behavior. Website and Paper:.🌐 📄 🧵10/n. #robotics #ai #ml #benchmark #roboeval

1

10

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

RoboEval is built for extensibility:.📦 Modular MJCF task assets.⚙️ Easy-to-add variations (obstacles, lighting, rotations).🧩 Supports imitation and and RL.✅ All metrics are automatically logged per rollout. 🧵9/n. #benchmark #simulation #mujoco #imitationlearning #ml.

1

0

6

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

🤔 Does your policy fumble with the left arm?.Miss the grasp? Wobble mid-lift?. 🔍 RoboEval pinpoints bottlenecks with stage-level analysis and structured outcome metrics. 🧵8/n. #robotics #ai #machinelearning #benchmarking

6

1

8

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

‼️We also observe that success rates can saturate (either all pass or all fail) -- and tell you nothing. But behavioral metrics stay informative — especially for diagnosing trajectory quality & failure distribution. 🧵7/n. #ai #robotics #machinelearning #benchmarking

2

1

7

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

🔍 Key finding:.Behavioral metrics correlate with binary success in 59.4% of task-metric pairs — suggesting execution quality often aligns with outcomes. This makes for a strong case for going beyond ✅/❌. 🧵6/n

1

0

7

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

Evaluated popular policies:.✅ ACT.✅ Diffusion Policy.✅ Behavior Cloning.✅ OpenVLA.While some achieve similar success rates, they may differ in coordination, smoothness, and stability — as our analysis reveals. 🧵5/n. #evaluation #robotics #machinelearning`

2

1

7

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

Our initial set of tasks brings 8 tasks × 3-5 variations each, supported by:.🤲 3,000 expert VR demos.🧪 Progress tracking, coordination & spatial diagnostics, and trajectory metrics.All in sim for full reproducibility. ➡️All in simulation for reproducibility. 🧵4/n.#benchmark

1

0

7

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

RoboEval goes beyond success/failure:.🛠 Coordination metrics (velocity + height sync).📉 Trajectory metrics (jerk, path length, and more).✋ Spatial precision metrics (grasp stability, collision monitoring).📊 Stagewise progression for each task. 🧵3/n. #robotics #ai

4

2

11

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

Most benchmarks stop at ✅/❌, but manipulation is multi-stage and multi-modal. 🤖 RoboEval adds fine-grained metrics and progress tracking to reveal how policies succeed or why they fail. Echos Russ Tedrake’s call for rigorous, scalable evaluation.🧵2/n.

1

0

10

Yi Ru (Helen) Wang

@YiruHelenWang

1 month

RT @yunchuzh: How should a robot perceive the world? What kind of visual representation leads to robust visuomotor policy learning for robo….

0

26

0

Yi Ru (Helen) Wang

@YiruHelenWang

2 months

Thrilled to be in Atlanta for #ICRA2025—as a member of the organizing committee for the first time! Join us Tuesday for two @ieeeras @ieee_ras_sac events: Lunch with Leaders (12:15–1:45pm) & Student Social Hour (7–9pm). Always happy to chat research or life—come say hi or DM me!

0

2

7