Karl Pertsch Profile
Karl Pertsch

@KarlPertsch

Followers
3K
Following
538
Media
98
Statuses
382

Robot Foundation Models @ UC Berkeley & Stanford & @physical_int | Postdoc w/ Sergey Levine & Chelsea Finn | Prev. Intern @ Google Brain, Meta AI | PhD @ USC.

Joined July 2015
Don't wanna be here? Send us removal request.
@KarlPertsch
Karl Pertsch
17 days
We’re releasing the RoboArena today!🤖🦾. Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help!. We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :).🧵
13
80
405
@KarlPertsch
Karl Pertsch
12 days
I'll give a talk about benchmarking generalist policies today at RSS (4:30p, RTH 526, in the benchmarking workshop)!.I will discuss sim eval, auto eval, and distributed real-world eval (ie RoboArena) -- swing by :)
Tweet media one
3
12
193
@KarlPertsch
Karl Pertsch
17 days
RT @RobobertoMM: It was time to improve our evaluations in robot learning! We introduce a methodology based on anonymous A/B testing: faire….
0
3
0
@KarlPertsch
Karl Pertsch
17 days
RT @abhishekunique7: Check out some of our new work on distributed robot evaluation led by @KarlPertsch, @pranav_atreya and @tonyh_lee! Hop….
0
6
0
@KarlPertsch
Karl Pertsch
17 days
RT @tonyh_lee: 🚀 We just launched RoboArena — a real-world evaluation platform for robot policies!.Think Chatbot Arena, but for robotics.….
0
15
0
@KarlPertsch
Karl Pertsch
17 days
RT @pranav_atreya: In robotics benchmarks are rarely shared. New eval setups are created for each new project, a stark difference from eval….
0
21
0
@KarlPertsch
Karl Pertsch
17 days
Thanks to my co-leads @pranav_atreya @tonyh_lee!.And thanks to the many collaborators from across the robotics community who agreed to help out with evals! .@moo_jin_kim @prodarhan @dineshjayaraman @RobobertoMM @GlenBerseth @abhishekunique7 @YoungwoonLee @percyliang.
1
0
6
@KarlPertsch
Karl Pertsch
17 days
You can join RoboArena both, by submitting policies, and by contributing evals! . Check our website for more details: Paper: Join RoboArena and show off your best policies! :). 7/.
1
1
8
@KarlPertsch
Karl Pertsch
17 days
RoboArena is based on the DROID platform. We provide all the resources you need to start training state of the art policies: open-source data, policy training code for DROID VLAs (now added to openpi), and DROID sim evals for debugging!. 6/
1
1
5
@KarlPertsch
Karl Pertsch
17 days
All eval episodes + scores are publicly accessible on our website + we create leaderboards and LLM-generated reports for each policy, with episode “citations” as evidence for strengths and weaknesses — check it out!. 5/
1
0
4
@KarlPertsch
Karl Pertsch
17 days
Like in Chatbot Arena, we can aggregate many such pairwise evals to compute a global policy ranking. As a result, RoboArena evals are comprehensive (many tasks + scenes), and trustworthy (no cherry-picking of tasks, evaluators don’t know which policies they evaluate). 4/.
1
0
4
@KarlPertsch
Karl Pertsch
17 days
We crowdsource evals across many institutions, and evaluators can freely choose to test *any task* in *any scene*. Only requirement: each eval runs two policies back-to-back, and provides feedback on which policy performed better. 3/
Tweet media one
1
0
6
@KarlPertsch
Karl Pertsch
17 days
Everyone knows robot benchmarking is hard: it’s tedious & challenging to reproduce. Prior benchmarks try to standardize (objects, viewpoints, …) but it’s a loosing battle, esp for generalist policies that need eval in many tasks/scenes. RoboArena takes a different approach….2/.
1
0
5
@KarlPertsch
Karl Pertsch
24 days
RT @LerrelPinto: Final note: It is easier to work on robotics now than any point in the past.
0
5
0
@KarlPertsch
Karl Pertsch
27 days
RT @kvablack: In LLM land, a slow model is annoying. In robotics, a slow model can be disastrous! Visible pauses at best, dangerously jerky….
0
55
0
@KarlPertsch
Karl Pertsch
27 days
RT @polkirichenko: Join us at #CVPR2025 Demographic Diversity in Computer Vision workshop tomorrow!.📅 Wednesday, June 11, 9am-6pm.📍 room 21….
0
20
0
@KarlPertsch
Karl Pertsch
1 month
Check out Danny's paper on a single-stage VLA recipe that trains fast, has fast inference, and follows language commands well. ⚡️⚡️⚡️.The key: combine FAST tokens + flow-matching expert, and make sure those pesky diffusion gradients don't mess up your beautiful VLM backbone! :).
@DannyDriess
Danny Driess
1 month
How to build vision-language-action models that train fast, run fast & generalize? In our new paper, we formalize & analyze the approach of our π-0.5 model & further improve it with a single stage recipe. Blog: Paper:
0
2
34
@KarlPertsch
Karl Pertsch
2 months
Here's the link to our original Embodied CoT work: Also lots of other works have since shown that grounded reasoning can help generalization (eg Gemini robotics, HAMSTER. ) -- I think we still have only scratched the surface on these approaches!.
@KarlPertsch
Karl Pertsch
1 year
Excited to release our work on Embodied Chain-of-Thought Reasoning today!. We can boost performance of vision-language-action models like OpenVLA by a large margin without any additional robot training data!. The key: simply think before you act!. 1/.
0
0
3
@KarlPertsch
Karl Pertsch
2 months
Our embodied CoT work (ECoT) showed that policies generalize better when allowed to reason step-by-step, at the expense of slower inference. Will's new work investigates *why* ECoT policies work better & develops "ECoT-Lite" recipes that run much faster & still generalize well!👇.
@verityw_
Will Chen
2 months
Embodied chain-of-thought reasoning (ECoT) is a powerful way to improve robot generalization & performance. But why is this the case, and how can that inform the design of learned robot policies?.We investigate these questions in our latest work!.1/6
1
1
18