Physical Intelligence Profile
Physical Intelligence

@physical_int

Followers
33K
Following
51
Media
59
Statuses
87

Physical Intelligence (Pi), bringing AI into the physical world.

San Francisco, CA
Joined March 2024
Don't wanna be here? Send us removal request.
@physical_int
Physical Intelligence
12 hours
All videos are autonomous. We also tested training "from scratch" (from a VLM initialization), but this failed on all tasks, indicating that fine-tuning our models is essential for success. For more, check out our blog post:
Tweet card summary image
pi.website
By fine-tuning our latest model, we were able to solve a series of very difficult manipulation challenge tasks.
1
3
78
@physical_int
Physical Intelligence
12 hours
Event 5 🥇: the gold medal task is to wash a frying pan in the sink using soap and water. Both sides. We also tackled silver (cleaning the fingers) and bronze (wiping the counter), in our blog post. The pan is hard to clean, but the robot rose to the challenge.
5
8
88
@physical_int
Physical Intelligence
12 hours
And here is our attempt at the gold. The Olympics requires using the fingers, but we had to use a sharper tool. That's a disqualification for us, but the result is still really interesting.
1
3
37
@physical_int
Physical Intelligence
12 hours
Event 4 🥈: we tried both gold and silver: peeling an orange and using a dog poop bag. For gold, we had to "bend the rules" and use a tool, so we believe we only take silver. The dog bag is really hard, not least because it blinds the wrist camera when in use:
3
2
34
@physical_int
Physical Intelligence
12 hours
And the bronze: using windex to clean a window, in this case one of our phone booths. Just don't get stuck inside while it's cleaning!
2
2
32
@physical_int
Physical Intelligence
12 hours
We also did the silver medal task: making a peanut butter sandwich. Very long horizon (open jar, spread butter, cut the sandwich into elegant triangles, and close the jar), lots of force and deformables.
2
4
37
@physical_int
Physical Intelligence
12 hours
Event 3 🥇: this was our favorite! tool use, we did all three tasks. The gold medal task was to use a key to unlock a lock. Very fine manipulation, fitting the key in the lock and turning it with enough force to unlock.
1
2
42
@physical_int
Physical Intelligence
12 hours
Event 2🥈: the gold-medal task is to hang an inside-out dress shirt, after turning it right-side-in, which our robot can't do physically (the gripper is too wide to fit inside the sleeves). So we tackled the silver-medal task, which is to turn a sock inside-out.
1
3
40
@physical_int
Physical Intelligence
12 hours
Event 1🥇: we fine-tune π0.6 for the "gold medal" task, going through a self-closing lever-handle door. This is hard because the robot has to keep the door open as it goes through it.
1
3
43
@physical_int
Physical Intelligence
12 hours
Benjie Holson proposed "Robot Olympics" - 5 "events" with gold/silver/bronze medal tasks like washing a pan: https://t.co/6pdRrNt7GH These are not tasks we made up ourselves. They illustrate "Moravec's paradox" - everyday tasks we find easy that current robots just can't do.
1
7
89
@physical_int
Physical Intelligence
12 hours
We got our robots to wash pans, clean windows, make peanut butter sandwiches, and more! Fine-tuning our latest model enables all of these tasks, and this has interesting implications for robotics, Moravec's paradox, and the future of large models in embodied AI. More below!
21
127
648
@physical_int
Physical Intelligence
6 days
Check out the blog post and full research paper for more details and experiments, including studies into high level vs low level transfer, comparisons to robot data, and quantifying the utility of wrist cameras. https://t.co/hLZdccTR6m
Tweet card summary image
pi.website
Exploring how transfer from human videos to robotic tasks emerges in robotic foundation models as they scale.
2
17
182
@physical_int
Physical Intelligence
6 days
This also shows up in the representations learned by the model. We plot the model’s representations of human and robot images. As pre-training is scaled up, the representation of humans and robots become more aligned: to a scaled-up model, human videos "look" like robot demos.
15
25
336
@physical_int
Physical Intelligence
6 days
We were surprised, and wanted to understand why. What about π0.5 enabled emergent human-robot transfer? We ran an experiment to test if it only appears above a certain scale. Turns out human transfer scales with the amount & diversity of robot data in VLA pre-training!
1
14
236
@physical_int
Physical Intelligence
6 days
If we use our full pre-trained pi05 model, simply finetuning with human video data can double the performance on tasks that are depicted in the human videos!
2
3
146
@physical_int
Physical Intelligence
6 days
We set out with the goal of understanding what it would take to make human data useful for VLAs like π0.5. We record egocentric human data with wearable cameras, and then include it in a co-training recipe with hand poses serving as actions.
2
7
189
@physical_int
Physical Intelligence
6 days
We discovered an emergent property of VLAs like π0/π0.5/π0.6: as we scale up pre-training, the model learns to align human videos and robot data! This gives us a simple way to leverage human videos. Once π0.5 knows how to control robots, it can naturally learn from human video.
78
344
3K
@physical_int
Physical Intelligence
1 month
To learn more, see more videos, and read a full research paper about Recap and π*0.6, see our blog post here:
Tweet card summary image
pi.website
A method for training our generalist policies with RL to improve success rate and throughput on real-world tasks.
1
14
85
@physical_int
Physical Intelligence
1 month
Quantitatively, training π*0.6 with RL can more than double throughput (number of successful task executions per hour) on the hardest tasks and cut the number of failures by as much as a factor of two.
1
2
58
@physical_int
Physical Intelligence
1 month
We also trained π*0.6 to assemble boxes. Here is an hour of box building, with about two and a half minutes per box.
1
2
39