Joe Ortiz
@joeaortiz
Followers
2K
Following
518
Media
16
Statuses
54
@GoogleDeepMind world models | Ex @AIatMeta (FAIR), PhD @imperialcollege
London, England
Joined December 2018
We’re hiring student researchers at Google DeepMind for 2026. Come work with our great team in SF on anything from diffusion / world models / 3D. Send me an email if you’re interested!
52
72
1K
We’re hiring research scientists & student researchers at Google DeepMind. DM or email me if you're interested! I’ll be at NeurIPS this week. Happy to chat in person!
72
118
3K
Genie3 is a real-time, interactive and general world model! Excited to see it used for training agents. It's also really fun to play around with.
#Genie3 is a real, interactive, playable experience. We're having so much fun with it at work---in between meetings, during breaks. Here's @RuiqiGao, @joeaortiz, @ChrisWu6080 following a pack of polar bears through a New York City street! Check out more on the webpage:
0
2
30
Genie 3 feels like a watershed moment for world models 🌐: we can now generate multi-minute, real-time interactive simulations of any imaginable world. This could be the key missing piece for embodied AGI… and it can also create beautiful beaches with my dog, playable real time
263
532
5K
Introducing our new MBRL agent for Craftax-classic. Our agent is both SOTA and the first to exceed human expert reward! 🕹️ The method combines new techniques for learning and planning with transformer world models. See details in @antoine_dedieu's 🧵 https://t.co/rTPLIsYrI7
Happy to share our new preprint “Improving Transformer World Models for Data-Efficient RL”: https://t.co/aOrRT8WJZB We propose a ladder of improvements to model-based RL and achieve for the first time a superhuman reward on the challenging Craftax-classic benchmark! 1/10
0
0
7
Robust visual + tactile perception is key for robot dexterity In our new @SciRobotics paper, we use neural fields for in-hand reconstruction + pose estimation of novel objects. https://t.co/HLS7AtTcey See @Suddhus awesome thread below
For robot dexterity, a missing piece is general, robust perception. Our new @SciRobotics work combines multimodal sensing with neural representations to perceive novel objects in-hand. 🎲 Featured on the cover of the November issue! #ScienceRoboticsResearch 🧵1/9
0
3
19
Check out our new paper where we learn multi-step diffusion world models and use them for planning with model predictive control! We're excited to see if these ideas can be applied to dexterous robot manipulation 🤖
Excited to share our new paper on "Diffusion Model Predictive Control" (D-MPC). Key idea: leverage diffusion models to learn a trajectory-level (not just single-step) world model to mitigate compounding errors when doing rollouts.
0
0
5
This was joint work with many fantastic collaborators including @antoine_dedieu @sirbayes @WLehrach @swaroopgj @lazarox8 @zhouguangyao We hope to see visual pretraining methods surpassing our baselines on DMC-VB in the future! Paper: https://t.co/ab3YHRJfqL 6/6
0
0
3
By reconstructing observations from the embeddings, we see that inverse dynamics pretraining and BC (no pretraining) learn to discard control irrelevant features (agent color and background). As a result, only these policies are robust to unseen distractors at test time. 5/6
1
0
3
We benchmark existing visual pretraining methods (inverse dynamics, forward dynamics, DINO, etc) by doing BC on top of the learned embedding. We find that existing pretraining methods do not help policy learning on DMC-VB. 4/6
1
0
2
To systematically evaluate visual pretraining methods for control, our DMC-VB dataset is larger and more extensive than prior datasets! - We have more diverse tasks, distractors and behavioral policies. - We provide states as an upper bound for representation learning. 3/6
1
0
2
Visual representations for control should capture minimal control relevant features while discarding spurious control irrelevant aspects of the scene. This is important for robotics, in which polices often fail due to changes in the background, lighting or camera viewpoint. 2/6
1
0
4
Introducing the DeepMind Control Vision Benchmark (DMC-VB)! A dataset and benchmark to evaluate the robustness of offline RL agents to visual distractors. @NeurIPSConf 2024 datasets and benchmark track! Code & data: https://t.co/TJtNqbo2eN 1/6
2
25
104
Can vision and language models be extended to include touch? Yes! We will present a new touch-vision-language dataset collected in the wild and Touch-Vision-Language Models (TVLMs) trained on this dataset at #ICML2024. 🙌 1/6 https://t.co/rZPIZYN3jZ
7
41
146
Humans heavily rely on estimating object-environment (extrinsic) contacts for many insertion tasks 🖐️ We show that estimating extrinsic contacts from gripper-object contact using tactile sensors results in more successful and efficient robot insertion policies!
🤔Are extrinsic contacts useful for manipulation policies? Neural Contact Fields estimate extrinsic contacts from touch. However, its utility in real-world tasks remains unknown. We improve NCF to enable sim-to-real transfer and use it to train policies for insertion tasks.
0
0
14
🔥Theseus 0.2.0 release is out! https://t.co/oEgnc3D1Z6 🚀Brings it to PyTorch 2.0 🚀Introduces torchlie and torchkin - efficient standalone libs for differentiable Lie groups and kinematics More updates and round up of few research projects in community enabled by Theseus 🧵👇
2
18
89
See you tomorrow at the Distributed Graph Algorithms for Robotics Workshop at #ICRA2023.
Our workshop on Distributed Graph Algorithms for Robotics at #ICRA2023 in London, May 29th accepting demo/poster submissions to May 2nd. Speakers: @fdellaert, @mhmukadam, @angelaschoellig, @MargaritaChli, @zzznah, @risi1979, @joeaortiz, @lazarox8 and just confirmed @rapideRobot.
1
2
33
DABA uses decentralised majorisation minimisation (MM) with Nesterov acceleration to converge fast. Comparing to belief prop (GBP - https://t.co/ymjBGC5Kde) GBP is better suited to highly distributed problems while MM works well when number devices <<< size of the factor graph.
0
0
3
New work solving very large BA problems on many devices (up to 32). Despite being decentralised DABA is more accurate than centralised Ceres and DeepLM, while being 950x and 170x faster! 🎉 Led by Taosha, at #RSS2023 Paper: https://t.co/7z80k8UoRJ Code: https://t.co/vgveNXm13z
1
13
65
Our workshop on Distributed Graph Algorithms for Robotics at #ICRA2023 in London, May 29th accepting demo/poster submissions to May 2nd. Speakers: @fdellaert, @mhmukadam, @angelaschoellig, @MargaritaChli, @zzznah, @risi1979, @joeaortiz, @lazarox8 and just confirmed @rapideRobot.
Scalable and resilient computation in robotics should be distributed, whether over many-robot graphs or within single chips. We present the new Workshop on Distributed Graph Algorithms for Robotics at #ICRA2023 in London https://t.co/IOBu74zxKJ; please submit paper and demos!
0
15
39