Karl Pertsch @KarlPertsch X Profile

Karl Pertsch

@KarlPertsch

Followers

3K

Following

538

Media

98

Statuses

382

Robot Foundation Models @ UC Berkeley & Stanford & @physical_int | Postdoc w/ Sergey Levine & Chelsea Finn | Prev. Intern @ Google Brain, Meta AI | PhD @ USC.

Joined July 2015

Don't wanna be here? Send us removal request.

Karl Pertsch

@KarlPertsch

17 days

We’re releasing the RoboArena today!🤖🦾. Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help!. We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :).🧵

13

80

405

Karl Pertsch

@KarlPertsch

12 days

I'll give a talk about benchmarking generalist policies today at RSS (4:30p, RTH 526, in the benchmarking workshop)!.I will discuss sim eval, auto eval, and distributed real-world eval (ie RoboArena) -- swing by :)

3

12

193

Karl Pertsch

@KarlPertsch

17 days

RT @RobobertoMM: It was time to improve our evaluations in robot learning! We introduce a methodology based on anonymous A/B testing: faire….

0

3

0

Karl Pertsch

@KarlPertsch

17 days

RT @abhishekunique7: Check out some of our new work on distributed robot evaluation led by @KarlPertsch, @pranav_atreya and @tonyh_lee! Hop….

0

6

0

Karl Pertsch

@KarlPertsch

17 days

RT @tonyh_lee: 🚀 We just launched RoboArena — a real-world evaluation platform for robot policies!.Think Chatbot Arena, but for robotics.….

0

15

0

Karl Pertsch

@KarlPertsch

17 days

RT @pranav_atreya: In robotics benchmarks are rarely shared. New eval setups are created for each new project, a stark difference from eval….

0

21

0

Karl Pertsch

@KarlPertsch

17 days

@cyrusneary @edward_s_hu @ShivinDass @JieWang_ZJUI @chelseabfinn @svlevine.

0

4

Karl Pertsch

@KarlPertsch

17 days

Thanks to my co-leads @pranav_atreya @tonyh_lee!.And thanks to the many collaborators from across the robotics community who agreed to help out with evals! .@moo_jin_kim @prodarhan @dineshjayaraman @RobobertoMM @GlenBerseth @abhishekunique7 @YoungwoonLee @percyliang.

1

0

6

Karl Pertsch

@KarlPertsch

17 days

You can join RoboArena both, by submitting policies, and by contributing evals! . Check our website for more details: Paper: Join RoboArena and show off your best policies! :). 7/.

1

8

Karl Pertsch

@KarlPertsch

17 days

RoboArena is based on the DROID platform. We provide all the resources you need to start training state of the art policies: open-source data, policy training code for DROID VLAs (now added to openpi), and DROID sim evals for debugging!. 6/

1

5

Karl Pertsch

@KarlPertsch

17 days

All eval episodes + scores are publicly accessible on our website + we create leaderboards and LLM-generated reports for each policy, with episode “citations” as evidence for strengths and weaknesses — check it out!. 5/

1

0

4

Karl Pertsch

@KarlPertsch

17 days

Like in Chatbot Arena, we can aggregate many such pairwise evals to compute a global policy ranking. As a result, RoboArena evals are comprehensive (many tasks + scenes), and trustworthy (no cherry-picking of tasks, evaluators don’t know which policies they evaluate). 4/.

1

0

4

Karl Pertsch

@KarlPertsch

17 days

We crowdsource evals across many institutions, and evaluators can freely choose to test *any task* in *any scene*. Only requirement: each eval runs two policies back-to-back, and provides feedback on which policy performed better. 3/

1

0

6

Karl Pertsch

@KarlPertsch

17 days

Everyone knows robot benchmarking is hard: it’s tedious & challenging to reproduce. Prior benchmarks try to standardize (objects, viewpoints, …) but it’s a loosing battle, esp for generalist policies that need eval in many tasks/scenes. RoboArena takes a different approach….2/.

1

0

5

Karl Pertsch

@KarlPertsch

24 days

RT @LerrelPinto: Final note: It is easier to work on robotics now than any point in the past.

0

5

0

Karl Pertsch

@KarlPertsch

27 days

RT @kvablack: In LLM land, a slow model is annoying. In robotics, a slow model can be disastrous! Visible pauses at best, dangerously jerky….

0

55

0

Karl Pertsch

@KarlPertsch

27 days

RT @polkirichenko: Join us at #CVPR2025 Demographic Diversity in Computer Vision workshop tomorrow!.📅 Wednesday, June 11, 9am-6pm.📍 room 21….

0

20

0

Karl Pertsch

@KarlPertsch

1 month

Check out Danny's paper on a single-stage VLA recipe that trains fast, has fast inference, and follows language commands well. ⚡️⚡️⚡️.The key: combine FAST tokens + flow-matching expert, and make sure those pesky diffusion gradients don't mess up your beautiful VLM backbone! :).

Danny Driess

@DannyDriess

1 month

How to build vision-language-action models that train fast, run fast & generalize? In our new paper, we formalize & analyze the approach of our π-0.5 model & further improve it with a single stage recipe. Blog: Paper:

0

2

34

Karl Pertsch

@KarlPertsch

2 months

Here's the link to our original Embodied CoT work: Also lots of other works have since shown that grounded reasoning can help generalization (eg Gemini robotics, HAMSTER. ) -- I think we still have only scratched the surface on these approaches!.

Karl Pertsch

@KarlPertsch

1 year

Excited to release our work on Embodied Chain-of-Thought Reasoning today!. We can boost performance of vision-language-action models like OpenVLA by a large margin without any additional robot training data!. The key: simply think before you act!. 1/.

0

3

Karl Pertsch

@KarlPertsch

2 months

Our embodied CoT work (ECoT) showed that policies generalize better when allowed to reason step-by-step, at the expense of slower inference. Will's new work investigates *why* ECoT policies work better & develops "ECoT-Lite" recipes that run much faster & still generalize well!👇.

Will Chen

@verityw_

2 months

Embodied chain-of-thought reasoning (ECoT) is a powerful way to improve robot generalization & performance. But why is this the case, and how can that inform the design of learned robot policies?.We investigate these questions in our latest work!.1/6

1

18