Yi Ru (Helen) Wang Profile
Yi Ru (Helen) Wang

@YiruHelenWang

Followers
466
Following
246
Media
28
Statuses
67

Ph.D. Student @uwcse @uw_robotics | Chair @ieee_ras_sac | NSERC-PGSD Fellow @nserc_crsng | BASc from @UofT EngSci | Robot learning & perception

Joined October 2012
Don't wanna be here? Send us removal request.
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
🚨Tired of binary pass/fail metrics that miss the bigger picture?. 🤖Introducing #RoboEval — an open benchmark that shows *how* robot manipulation policies behave and *why* they fail, not just *if* they succeed. 🧵1/n.🔗 📄
6
35
190
@YiruHelenWang
Yi Ru (Helen) Wang
30 days
RT @Jianfei_AI: In this generation of embodied intelligence, success isn't just about whether the task is completed. We care how it's done:….
0
2
0
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
RT @JieWang_ZJUI: I love the idea of 'Behavior Metric'!.Like we did in #RoboArena, we are trying to record more fine-grained, general and u….
0
1
0
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
RT @DJiafei: How do we accelerate progress in robotics?. As @JitendraMalikCV has noted, computer vision leapt ahead once robust, openly-sha….
0
2
0
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
RT @DJiafei: In robotics, task success/failure has long been the default metric. But is binary success enough to capture a policy’s true ca….
0
10
0
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
RT @KyleMorgenstein: RL for robotics has struggled to take advantage of compute scale for hyper param sweeps because we don’t have stable r….
0
5
0
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
RT @OfirOzeri: One of the most impotent indications to the converges coming into robotics is more and more benchmarking coming in hot. They….
0
1
0
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
RT @_tonytao_: Having reliable signals to compare different robot policies is sooo important. The more informative the metrics, the better….
0
1
0
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
👏Thanks to all my collaborators and advisors @CarterUng35146, Grant Tannert, @DJiafei, Josephine Li, Amy Le, Rishabh Oswal, Markus Grotz, @wpumacay7567, @nild0000, @RanjayKrishna, @fox_dieter17849, @siddhss5 . Y'all rock! 🎉.
0
0
6
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
✨TL;DR:.RoboEval shifts evaluation from "did it succeed?" to "how did it perform?". We hope this unlocks deeper insights into robotic policy behavior. Website and Paper:.🌐 📄 🧵10/n. #robotics #ai #ml #benchmark #roboeval
Tweet media one
1
1
10
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
RoboEval is built for extensibility:.📦 Modular MJCF task assets.⚙️ Easy-to-add variations (obstacles, lighting, rotations).🧩 Supports imitation and and RL.✅ All metrics are automatically logged per rollout. 🧵9/n. #benchmark #simulation #mujoco #imitationlearning #ml.
1
0
6
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
🤔 Does your policy fumble with the left arm?.Miss the grasp? Wobble mid-lift?. 🔍 RoboEval pinpoints bottlenecks with stage-level analysis and structured outcome metrics. 🧵8/n. #robotics #ai #machinelearning #benchmarking
Tweet media one
6
1
8
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
‼️We also observe that success rates can saturate (either all pass or all fail) -- and tell you nothing. But behavioral metrics stay informative — especially for diagnosing trajectory quality & failure distribution. 🧵7/n. #ai #robotics #machinelearning #benchmarking
Tweet media one
2
1
7
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
🔍 Key finding:.Behavioral metrics correlate with binary success in 59.4% of task-metric pairs — suggesting execution quality often aligns with outcomes. This makes for a strong case for going beyond ✅/❌. 🧵6/n
Tweet media one
1
0
7
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
Evaluated popular policies:.✅ ACT.✅ Diffusion Policy.✅ Behavior Cloning.✅ OpenVLA.While some achieve similar success rates, they may differ in coordination, smoothness, and stability — as our analysis reveals. 🧵5/n. #evaluation #robotics #machinelearning`
Tweet media one
2
1
7
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
Our initial set of tasks brings 8 tasks × 3-5 variations each, supported by:.🤲 3,000 expert VR demos.🧪 Progress tracking, coordination & spatial diagnostics, and trajectory metrics.All in sim for full reproducibility. ➡️All in simulation for reproducibility. 🧵4/n.#benchmark
Tweet media one
1
0
7
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
RoboEval goes beyond success/failure:.🛠 Coordination metrics (velocity + height sync).📉 Trajectory metrics (jerk, path length, and more).✋ Spatial precision metrics (grasp stability, collision monitoring).📊 Stagewise progression for each task. 🧵3/n. #robotics #ai
Tweet media one
4
2
11
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
Most benchmarks stop at ✅/❌, but manipulation is multi-stage and multi-modal. 🤖 RoboEval adds fine-grained metrics and progress tracking to reveal how policies succeed or why they fail. Echos Russ Tedrake’s call for rigorous, scalable evaluation.🧵2/n.
1
0
10
@YiruHelenWang
Yi Ru (Helen) Wang
1 month
RT @yunchuzh: How should a robot perceive the world? What kind of visual representation leads to robust visuomotor policy learning for robo….
0
26
0
@YiruHelenWang
Yi Ru (Helen) Wang
2 months
Thrilled to be in Atlanta for #ICRA2025—as a member of the organizing committee for the first time! Join us Tuesday for two @ieeeras @ieee_ras_sac events: Lunch with Leaders (12:15–1:45pm) & Student Social Hour (7–9pm). Always happy to chat research or life—come say hi or DM me!
Tweet media one
0
2
7