Danny Driess
@DannyDriess
Followers
4K
Following
200
Media
42
Statuses
152
Research Scientist @physical_int. Formerly Google DeepMind
Joined August 2021
How to build vision-language-action models that train fast, run fast & generalize? In our new paper, we formalize & analyze the approach of our π-0.5 model & further improve it with a single stage recipe. Blog: https://t.co/IihKEmmxSB Paper: https://t.co/JfEU7pcoZk
6
24
221
The idea behind significantly improving the performance on hard real-world tasks is to train a value function, condition the model on advantages computed from the value function, and running an iterative improvement loop where the model learns from it’s own data.
1
0
4
The base model powering π*0.6 is trained with Knowledge Insulation
Our model can now learn from its own experience with RL! Our new π*0.6 model can more than double throughput over a base model trained without RL, and can perform real-world tasks: making espresso drinks, folding diverse laundry, and assembling boxes. More in the thread below.
1
0
10
More info about knowledge insulation here: https://t.co/h0DiPRRLge Code + checkpoints:
github.com
Contribute to Physical-Intelligence/openpi development by creating an account on GitHub.
0
0
4
We open-sourced pi-05 today. All checkpoints that we release have been trained with Knowledge Insulation
We've added pi-05 to the openpi repo: pi05-base, pi05-droid, pi05-libero. Also added PyTorch training code!🔥 Instructions and code here: https://t.co/EOhNYfpq9B This is an updated version of the model we showed cleaning kitchens and bedrooms in April:
3
14
162
TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: https://t.co/n0qmDRivRH One of our main goals for this paper was to put out a very careful and thorough study on the topic to help people understand the state of the
8
109
487
Had a blast on the Unsupervised Learning Podcast with @hausman_k! We covered the past, present, and future of robot learning 🤖 Big thanks to @jacobeffron for being a fantastic host!
New Unsupervised Learning with @hausman_k & @DannyDriess (@physical_int) on building generalist robotics foundation models and: - What’s next in AI x robotics - Biggest outstanding questions - How they 10x’d model training speed - Open sourcing π 0 - Breakthroughs
1
0
29
It was a really fun project with the amazing team @physical_int including @brian_ichter, Jost Tobias Springenberg, @liliyu_lili, Adrian Li-Bell, @KarlPertsch, @allenzren, @HomerWalke, @QuanVng, @lucy_x_shi, @slevine
0
0
3
Our paper includes many ablations & details about various modeling choices. Check it out :)
1
0
2
One might think that another way of “knowledge insulation” would be to just freeze the backbone. This, however, does not work as shown below, indicating that a base VLM does not contain sufficient representations for robot motions.
1
0
4
We call this procedure “knowledge insulation” and the resulting model π-0.5 + KI. Here are some videos of the model controlling a mobile manipulator in unseen homes.
1
1
3
The model follows language instructions much better, has high performance, and fast inference speed. We also train the model on web-data at the same time, which further increases generalization.
1
0
3
It turns out that this is the best of both worlds: The model learns really fast (7.5 times faster than π-0, as fast as π-0-FAST), but with the advantages of the action expert of π-0 at inference time.
1
0
4
Our insight is to stop the gradient from the action expert and instead train the VLM backbone with discretized FAST actions to learn representations. This way, we “insulate” the knowledge of the pre-trained VLM, but still ensure to adapt the backbone to robotics.
1
0
3
While flow-matching action experts are great for inference since they produce continuous actions and have fast inference, their gradients are a bad signal for training. Consequently, the model trains slowly and struggles with following language instructions.
1
0
5
Check out our new work where we dissect various aspects of chain-of-thought at both training and inference time) for robotics! Awesome work led by @verityw_
Embodied chain-of-thought reasoning (ECoT) is a powerful way to improve robot generalization & performance. But why is this the case, and how can that inform the design of learned robot policies? We investigate these questions in our latest work! https://t.co/QTPrXgxPrG 1/6
0
0
13
We auto-encode point tracks to automatically evaluate motion realism in generative video models. By inherently focusing on motion, our new metric (TRAJAN) correlates much better with human judgments of these models than appearance based metrics.
Humans can tell the difference between a realistic generated video and an unrealistic one – can models? Excited to share TRAJAN: the world’s first point TRAJectory AutoeNcoder for evaluating motion realism in generated and corrupted videos. 🌐 https://t.co/ytEmuAPcYa 🧵
0
0
7
Scaling data diversity, transfer between data sources, and a good training recipe were the main ingredients to allow robots to generalize to new homes!
We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️
0
0
40
More insights: π-0.5 is trained to break tasks down into subtasks, before producing actual robot actions. It turns out that adding the subtask prediction data is useful, even if you query the model with the overall task directly.
0
0
34