Julen Urain
@robotgradient
Followers
1K
Following
2K
Media
50
Statuses
272
Robotics Tinkerer. RS@FAIR (Embodied AI) Prev: @DFKI, @TUDarmstadt, @NvidiaAI. https://t.co/RQpq7Prbln X https://t.co/umZQeDjJv4
Joined November 2017
This was very challenging and very cool to see evolve! I personally was no sure if it would work, but @irmakkguzey pushed so hard to show it does. Learning dexterous robot policies with only human video data, using the egocentric view from Aria2 glasses, chill and easy 😁
Dexterous manipulation by directly observing humans - a dream in AI for decades - is hard due to visual and embodiment gaps. With simple yet powerful hardware - Aria 2 glasses 👓 - and our new work AINA 🪞, we are now one significant step closer to achieving this dream.
0
0
7
After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3
80
504
4K
The expert mode is going to bring a lot of news in the future 🙃
0
0
6
Introducing DINOv3 🦕🦕🦕 A SotA-enabling vision foundation model, trained with pure self-supervised learning (SSL) at scale. High quality dense features, combining unprecedented semantic and geometric scene understanding. Three reasons why this matters…
12
140
1K
Anyone interested in tactile sensing for robotics should be following Akash's solid releases. How should we integrate rich tactile sensing modality for policy learning?
Robots need touch for human-like hands to reach the goal of general manipulation. However, approaches today don’t use tactile sensing or use specific architectures per tactile task. Can 1 model improve many tactile tasks? 🌟Introducing Sparsh-skin: https://t.co/DgTq9OPMap 1/6
1
3
11
Touch perception holds the key to unlock robot dexterity Our new @SciRobotics work shows how to fuse tactile & vision, track pose+shape of novel objects during dexterous manipulation https://t.co/TNlaBjB6Ra It's a culmination of our work over the last 4 years, see @Suddhus 🧵⬇
For robot dexterity, a missing piece is general, robust perception. Our new @SciRobotics work combines multimodal sensing with neural representations to perceive novel objects in-hand. 🎲 Featured on the cover of the November issue! #ScienceRoboticsResearch 🧵1/9
1
12
81
On the other side, SE(3) Flow Matching is a prefered option to SE(3) Diffusion. It is deterministic and its build on straight paths over the geodesics, leading to a much simpler generation/optimization paths.
1
0
0
In our intepretation, Invariant networks should be preferred over Equivariant networks as long as you can achieve same results (i.e. equivariant action generation). Imposing Invariance over your network is simpler and allows using most of the common functions (ReLU, Linear...)
1
0
0
When we were working in SE(3)-DiffusionFields, we were not aware how far this could be extended. Our new work shows that: - SE(3) Flow Matching is a simple yet powerful alternative to SE(3) Diffusion for robotics. - We can use Invariant network for equivariant action generation.
I am excited to share our recent work on "ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching". The work presents a novel policy class combining Flow Matching with SE(3) Invariant Transformers for fast, equivariant, and expressive
1
9
57
Perfect start to the #CoRL2024 week! Was a pleasure organizing the NextGen Robot Learning Symposium at @TUDarmstadt with @firasalhafez @GeorgiaChal Thanks to the speakers for the great talks! @YunzhuLiYZ
@NimaFazeli7 @Dian_Wang_ @HaojieHuang13 @Vikashplus @ehsanik @Oliver_Kroemer
1
7
61
Okey, not super impressive! BUT, this plot gives us hope! we found that the agent keeps improving on unseen songs with more and more demonstrations, so we are hopeful that given the large datasets available on the Internet, our agent will keep improving in the future 🎹
0
0
6
TASK 2: take all these RL policies and distill them (BC) into a Diffusion Policy. We use a hierarchical policy. The top layer policy outputs desired fingertip motions while the bottom layer policy generates the configuration space actions. Performance in unseen songs 👇
2
0
8
We train individual RL single-song policies, which we leverage to generate the desired observation-action pairs! Observe that using fingertip demonstrations leads to human-like motions, while the absence of them leads to policies behaving in unexpected styles.
1
0
8
We first extract both the FINGERTIP motion trajectories and a TASK trajectory (the song), which informs the task to be solved, from the videos. Then, we apply residual RL, using the fingertip motion as the nominal behaviour and the task trajectory as reward
1
0
5
So, what is the challenge? Unlike teleoperation data, video data lacks the action info needed to determine what control signals should be applied to the robot to accomplish the observed task. TASK 1: Infer the actions the robot should do to match the videos. How? RL + IL
1
0
5
@ChengQian0112 , with the collaboration of @kevin_zakka and @Jan_R_Peters introduces a simple framework to learn highly dexterous manipulation skills from videos. Given the large amount of video data, we can learn a generalist policy that can play ANY song.
1
0
5
YouTube is a LARGE dataset of demonstration videos to train Generalist robot agents, but lacks action data. How can we learn DEXTEROUS skills from them? In #CoRL2024, we explore the problem of learning a Generalist Piano Playing agent from YouTube videos. https://t.co/nRRy3hdqkL
6
43
316
Generalization is an essential property to make robots perform well in novel out-of-distribution contexts. How can we improve the generalization of our models? We explore strategies such as Composition, Extracting meaningful features, or observations and actions grounding.
0
1
4
Should you generate configuration space actions or task space actions? Should you generate trajectories or keyposes? Is it better to generate position actions or velocity actions? We explore different action representation modalities and highlight when to use each.
1
0
4
In terms, of generative models, we explore Diffusion Models, Energy-Based Models, Action Values Maps or GPT-Style models to name a few. We compare the benefits and pitfalls of each generative model and suggest situations in which each model would provide the best.
1
0
3