At Covariant, we have been deploying robots to the real world and thinking hard about how to build truly AI for general purpose robots that can go beyond demos. For a long time, we are building up large robotics datasets through our deployments without the right model that can
Interesting paper from DeepMind: near SOTA density modeling performance on MNIST with only **single pass** through training data and online gradient descent
We’ve raised $40m in Series B funding led by
@IndexVentures
w/ AI-focused
@Radicalvcfund
+ existing investor
@AmplifyPartners
. Grateful for the support of our investors, customers + partners as we continue to bring AI Robotics to the real world!
What separates lab robot demos and robots in production? Extremely high reliability. This requires our model to robustly handle many long-tail scenarios like the one in attached picture, where one stray barcode label can tank the 99.95% sortation accuracy requirements. (1/n)
resonates with my own experience:
robot demo --> INSANE PAIN --> robot creating value in production
luckily I am hopeful we have by now paid most of our dues at Covariant :)
Text foundation models (LLMs) have an incredible ability to adapt to new problems through in-context learning. We show that it’s possible for robots to learn in context as well, in our latest scaling update of RFM-1, Covariant’s robotics foundation model. (1/n)
RFM-1's latest scaling update enables robots with in-context learning of grasping improvements.
The video shows the self-reflective reasoning capability — after a few tries and failing, the robot has an internal dialogue, hypothesizing that its current gripper is not suited for
What makes the training data for RFM-1 unique? A few properties that are distinctive from typical lab data: 1. real-world complexity: picking from extremely cluttered scenes where item occlusion presents a challenge for reliability. 2. high-speed handling: the dynamics of
This is not a one-off cycle. This is performance over repeated cycles – the new benchmark for reliable AI Robotic systems.
High pick rates, navigating cluttered environments, no double-picks, scoops, or errors – just consistent flawless execution, parcel after parcel.
Much of the groundbreaking advancements we've witnessed over the past decade, spanning from computer vision, speech recognition, protein folding prediction, and beyond, hinge on the deep learning work conducted by
@geoffreyhinton
, who has fundamentally changed the focus and
Congrats
@chelseabfinn
@hausman_k
@svlevine
on starting a new company. We need more people to work on solving the physical world data challenge and bring foundation models to robotics!
I’m really excited to be starting a new adventure with multiple amazing friends & colleagues.
Our company is called Physical Intelligence (Pi or π, like the policy).
A short thread 🧵
It turned out that going from 90% success rate to 99.95% required significantly more data to cover diverse failure modes. This is why a foundation model approach to robotics, instead of building task/embodiment specific policy, so powerful because one single model can leverage
Generative models can drastically accelerate database systems! A new learning task was introduced: Range Density Estimation, for more details see thread 👇
Can self-supervised learning help computer systems?
Our
#ICML2020
paper equips autoregressive models to optimize databases. We introduce a new task, range density: estimate the prob. of variables in ranges. A super simple trick gives 10-100x gains.
👇 1/
With RFM-1 and its multimodal setup, we have the ability to learn from a large amount of robots’ interaction with the world: learning robust manipulation policies by looking at robot actions+outcome across millions of distinct items, learning an intuitive physical world model by
Robotics has been a challenging field for years, AI has changed the game and the time is now. Today, we are announcing our investment in
@CovariantAI
. Excited for our journey with
@pabbeel
and Peter Chen:
Thanks
@alexgkendall
-- we also love the work that
@wayve_ai
is doing to bring foundation models to autonomous driving. It's going to be an exciting year for robotics!
Exciting result showing how robust, accessible and trustworthy robotics is becoming with AI foundation models. And I'm sure lots more to come.. congratulations
@pabbeel
@peterxichen
🎉
Advances in open-source base LLMs and the increasing availability of large amount of image-text dataset alo mean that RFM-1 can fluently handle text tokens as input and output, which open up a lot of product possibilities on how people and robots can collaborate. (4/n)
Before we started Covariant, we were very encouraged by how well imitation learning from human demonstration can work. 30min of human teleop data can train policies that have 80-90% success rate. (2/n)
We have more exciting announcements coming up soon as we deploy RFM-1 to customers and continue to scale up data. In the meantime, take a look at our blog post.
If pushing forward robotics foundation models, by going through the hard challenges of
What’s even more exciting is that there is shockingly little inductive bias that needs to be manually encoded, RFM-1 is trained in next-token prediction, which gives us confidence that it will scale well with data. See attached for image, action and video generations that come
This is an amazing effort to collect more robotics data. I especially love that they have both structured data like multi-view stereo and more modern modality like text annotations. The key gap in robotics is data and it’s great to see the progress. Congrats
@SashaKhazatsky
After two years, it is my pleasure to introduce “DROID: A Large-Scale In-the-Wild Robot Manipulation Dataset”
DROID is the most diverse robotic interaction dataset ever released, including 385 hours of data collected across 564 diverse scenes in real-world households and offices
Giving Robots the Ability to Reason
Today we talked to
@CovariantAI
on
@technology
about their foundation model and why tackling the AI behind advanced robotics in isolation is a good strategy. Thanks for coming on the show
@peterxichen
We are continuing to see intelligence of how to interact with the world emerge from pre-training a large multi-modal model on datasets that are cleaned and structured in exactly the right way. Join us () to build the largest real-world robotics dataset
The multi-modal sequence setup of RFM-1 means it can attend to a list of previous episodes of (input image, robot action, sensor readings that indicate outcome) to come up with an improved image->action policy on the fly. (2/n)
One more non-Covariant research mention: this type of ability to adapt policy in-context also has parallel in humanoid locomotion. See the amazing work by
@ir413
casting humanoid locomotion as a next token prediction problem (similar to RFM-1): We just
Pre-training on millions of sequences of previous robot interactions implicitly teach RFM-1 rich knowledge on how to adapt in-context: if the grasping on exposed fabric failed and then grasping on paper label is successful, then the policy should avoid fabric; if a specific
@Joe__Black__
We expect RFM-1 to power humanoid robots and different kinds of hands (like those with fingers) as well! We would need to collect more targeted data for those hardware form factors, which will become easier as they become more mature.
@goodfellow_ian
Good point! "We made sure that the sets of writers of the training set and test set were disjoint." () so yeah this makes the evaluation less interpretable
@goodfellow_ian
I think it's training on test set so it uses more data than other off-line methods. but it's not cheating as long as it only takes one pass through the data and doesn't evaluate NLL on any image that it has performed gradient descent on