
Kevin Black
@kvablack
Followers
2K
Following
697
Media
28
Statuses
105
phd @berkeley_ai, research @physical_int
Joined March 2018
π0.5 is a step improvement over π0. I just watch our robot fold 25 napkins in a row in 44 mins, including disentangling multiple napkins accidentally pulled into the workspace. Knowledge insulation ftw. Congrats @physical_int and thank you for open sourcing your work.
5
15
248
We've added pi-05 to the openpi repo: pi05-base, pi05-droid, pi05-libero. Also added PyTorch training code!🔥 Instructions and code here: https://t.co/EOhNYfpq9B This is an updated version of the model we showed cleaning kitchens and bedrooms in April:
7
5
161
Implemented @physical_int’s Real‑Time Chunking (RTC) on @huggingface’s SmolVLA in the @LeRobotHF repo! It noticeably reduces jerky motion compared with basic merge strategies during async inference!🧵1/
9
22
194
That’s all for now! This project was a long time coming, be sure to check out the full blog post and paper here:
pi.website
Physical Intelligence is bringing general-purpose AI into the physical world.
0
1
29
Finally, there’s a subtle issue with non-real-time inference that’s easy to overlook: distribution shift. Pauses for inference are not in the training data! We found that RTC was not only faster, but also more precise and consistent than our old synchronous strategy.
2
1
22
To prepare for this future, we added up to +200ms of artificial latency to π0.5 (>300ms total), and the speed and performance of RTC were totally unaffected!
1
0
16
Model size is not the only contributor to latency. Personally, I’m betting that the VLAs that solve physical intelligence will not be able to fit in onboard robot computers. That means we will need centralized inference servers, and we will have network latency.
2
0
20
Importantly, this requires no training-time changes! It’s applicable to any diffusion- or flow-based policy at inference time. With RTC, we get smooth real-time execution.
1
0
23
Our solution, real-time chunking (RTC), combines action chunking with inpainting — the actions within the inference delay are frozen, while the rest are “inpainted” in a way that’s consistent with the previous plan.
1
3
36
For smooth execution, we need to always produce the next action as soon as it’s needed. This is called a “real-time constraint”. With high-latency models, this requires concurrency: generating new actions while executing old ones. But naive concurrency does not work.
1
0
16
In LLM land, a slow model is annoying. In robotics, a slow model can be disastrous! Visible pauses at best, dangerously jerky motions at worst. But large VLAs are slow by nature. What can we do about this? An in-depth 🧵:
12
63
488
I mean, technically the model is optimized... by the XLA compiler, not by a human! from
arxiv.org
Recent vision-language-action models (VLAs) build upon pretrained vision-language models and leverage diverse robot datasets to demonstrate strong task execution, language following ability, and...
1
1
12
This caption is a bit funny to me because we've put precisely zero effort into optimizing our model implementation. Thanks JAX!
3
5
117
We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️
54
261
2K
We are excited to share new experiments with AgiBot @AgiBot_zhiyuan on multi-task, multi-embodiment VLAs! With one model that can perform many tasks with both two-finger grippers and multi-fingered hands, we take another step toward one model for all robots and tasks.
11
65
391
Many of you asked for code & weights for π₀, we are happy to announce that we are releasing π₀ and pre-trained checkpoints in our new openpi repository! We tested the model on a few public robots, and we include code for you to fine-tune it yourself.
37
215
1K
My favorite slide that I made for my talk last weekend -- a very silly thought experiment in which we compare language datasets to robotics datasets (in the most shallow way possible). Yes it is to scale; I learned that the maximum shape size in Keynote is 20,000pts
5
4
90
Here's a link to the recording for anyone that's interested! https://t.co/VmMcDOCBWA
If you're at #CoRL2024, come check out my talk at the X-Embodiment workshop at 1:30pm! Thanks to @KarlPertsch for inviting me to speak!
3
17
182
If you're at #CoRL2024, come check out my talk at the X-Embodiment workshop at 1:30pm! Thanks to @KarlPertsch for inviting me to speak!
3
10
150