Danny Driess Profile
Danny Driess

@DannyDriess

Followers
4K
Following
198
Media
39
Statuses
147

Research Scientist @physical_int. Formerly Google DeepMind

Joined August 2021
Don't wanna be here? Send us removal request.
@DannyDriess
Danny Driess
3 months
How to build vision-language-action models that train fast, run fast & generalize? In our new paper, we formalize & analyze the approach of our π-0.5 model & further improve it with a single stage recipe. Blog: Paper:
6
24
223
@DannyDriess
Danny Driess
2 months
RT @RussTedrake: TRI's latest Large Behavior Model (LBM) paper landed on arxiv last night! Check out our project website: .
0
107
0
@grok
Grok
8 days
Join millions who have switched to Grok.
221
457
3K
@DannyDriess
Danny Driess
2 months
Had a blast on the Unsupervised Learning Podcast with @hausman_k!.We covered the past, present, and future of robot learning 🤖.Big thanks to @jacobeffron for being a fantastic host!.
@jacobeffron
Jacob Effron
2 months
New Unsupervised Learning with @hausman_k & @DannyDriess (@physical_int) on building generalist robotics foundation models and: . - What’s next in AI x robotics. - Biggest outstanding questions. - How they 10x’d model training speed. - Open sourcing π 0 . - Breakthroughs
1
0
29
@DannyDriess
Danny Driess
3 months
It was a really fun project with the amazing team @physical_int including @brian_ichter, Jost Tobias Springenberg, @liliyu_lili, Adrian Li-Bell, @KarlPertsch, @allenzren, @HomerWalke, @QuanVng, @lucy_x_shi, @slevine.
0
0
3
@DannyDriess
Danny Driess
3 months
Our paper includes many ablations & details about various modeling choices. Check it out :).
1
0
2
@DannyDriess
Danny Driess
3 months
One might think that another way of “knowledge insulation” would be to just freeze the backbone. This, however, does not work as shown below, indicating that a base VLM does not contain sufficient representations for robot motions.
Tweet media one
1
0
4
@DannyDriess
Danny Driess
3 months
We call this procedure “knowledge insulation” and the resulting model π-0.5 + KI. Here are some videos of the model controlling a mobile manipulator in unseen homes.
1
1
3
@DannyDriess
Danny Driess
3 months
The model follows language instructions much better, has high performance, and fast inference speed. We also train the model on web-data at the same time, which further increases generalization.
Tweet media one
1
0
3
@DannyDriess
Danny Driess
3 months
It turns out that this is the best of both worlds: The model learns really fast (7.5 times faster than π-0, as fast as π-0-FAST), but with the advantages of the action expert of π-0 at inference time.
Tweet media one
1
0
4
@DannyDriess
Danny Driess
3 months
Our insight is to stop the gradient from the action expert and instead train the VLM backbone with discretized FAST actions to learn representations. This way, we “insulate” the knowledge of the pre-trained VLM, but still ensure to adapt the backbone to robotics.
Tweet media one
1
0
3
@DannyDriess
Danny Driess
3 months
While flow-matching action experts are great for inference since they produce continuous actions and have fast inference, their gradients are a bad signal for training. Consequently, the model trains slowly and struggles with following language instructions.
Tweet media one
1
0
5
@DannyDriess
Danny Driess
3 months
Check out our new work where we dissect various aspects of chain-of-thought at both training and inference time) for robotics!.Awesome work led by @verityw_.
@verityw_
Will Chen
3 months
Embodied chain-of-thought reasoning (ECoT) is a powerful way to improve robot generalization & performance. But why is this the case, and how can that inform the design of learned robot policies?.We investigate these questions in our latest work!.1/6
0
0
13
@DannyDriess
Danny Driess
4 months
We auto-encode point tracks to automatically evaluate motion realism in generative video models. By inherently focusing on motion, our new metric (TRAJAN) correlates much better with human judgments of these models than appearance based metrics.
Tweet media one
@KelseyRAllen
Kelsey Allen
4 months
Humans can tell the difference between a realistic generated video and an unrealistic one – can models?. Excited to share TRAJAN: the world’s first point TRAJectory AutoeNcoder for evaluating motion realism in generated and corrupted videos. 🌐 🧵
0
0
7
@DannyDriess
Danny Driess
4 months
Scaling data diversity, transfer between data sources, and a good training recipe were the main ingredients to allow robots to generalize to new homes!.
@physical_int
Physical Intelligence
4 months
We got a robot to clean up homes that were never seen in its training data! Our new model, π-0.5, aims to tackle open-world generalization. We took our robot into homes that were not in the training data and asked it to clean kitchens and bedrooms. More below⤵️
0
0
40
@DannyDriess
Danny Driess
4 months
More insights: π-0.5 is trained to break tasks down into subtasks, before producing actual robot actions. It turns out that adding the subtask prediction data is useful, even if you query the model with the overall task directly.
Tweet media one
0
0
34
@DannyDriess
Danny Driess
4 months
In particular, the diverse robot in-the-wild helps a lot, even though this data is from static robots, and we evaluate on mobile manipulator tasks. Check out more in the blog post and paper
Tweet card summary image
pi.website
Our latest generalist policy, π0.5, extends π0 and enables open-world generalization. Our new model can control a mobile manipulator to clean up an entirely new kitchen or bedroom.
0
0
4
@DannyDriess
Danny Driess
4 months
To achieve this, we enabled π-0.5 to be trained on many different data sources, from multi-environment mobile manipulator data, to static robot data in-the-wild, in the lab, and more classical vision-language data. Those additional data sources help the model significantly!
Tweet media one
1
0
4
@DannyDriess
Danny Driess
4 months
I think this is a very exciting result. Generally, all (except for this green line above) evaluations in our paper are done in unseen environments!.
2
0
4
@DannyDriess
Danny Driess
4 months
Robot systems are usually evaluated in scenes where you have seen data. With π-0.5, we want to change this. We show that by scaling the number of environments where we collect data, we can achieve the same performance in unseen homes compared to allowing in-distribution data
Tweet media one
1
0
4
@DannyDriess
Danny Driess
4 months
π-0.5: a step towards open-world robot generalization. More (exciting) insights below
2
6
75