Julen Urain @robotgradient X Profile

Julen Urain

@robotgradient

Followers

1K

Following

2K

Media

50

Statuses

272

Robotics Tinkerer. RS@FAIR (Embodied AI) Prev: @DFKI, @TUDarmstadt, @NvidiaAI. https://t.co/RQpq7Prbln X https://t.co/umZQeDjJv4

https://t.co/K9FNhuI9af

Joined November 2017

Don't wanna be here? Send us removal request.

Julen Urain

@robotgradient

26 days

This was very challenging and very cool to see evolve! I personally was no sure if it would work, but @irmakkguzey pushed so hard to show it does. Learning dexterous robot policies with only human video data, using the egocentric view from Aria2 glasses, chill and easy 😁

Irmak Guzey

@irmakkguzey

26 days

Dexterous manipulation by directly observing humans - a dream in AI for decades - is hard due to visual and embodiment gaps. With simple yet powerful hardware - Aria 2 glasses 👓 - and our new work AINA 🪞, we are now one significant step closer to achieving this dream.

0

7

Bingyi Kang

@bingyikang

1 month

After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀 Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video. In pursuit of minimal modeling, DA3

80

504

4K

Julen Urain

@robotgradient

2 months

The expert mode is going to bring a lot of news in the future 🙃

1X

@1x_tech

2 months

NEO The Home Robot Order Today

0

6

Max Seitzer

@maxseitzer

4 months

Introducing DINOv3 🦕🦕🦕 A SotA-enabling vision foundation model, trained with pure self-supervised learning (SSL) at scale. High quality dense features, combining unprecedented semantic and geometric scene understanding. Three reasons why this matters…

12

140

1K

Julen Urain

@robotgradient

7 months

Anyone interested in tactile sensing for robotics should be following Akash's solid releases. How should we integrate rich tactile sensing modality for policy learning?

Akash Sharma

@akashshrm02

7 months

Robots need touch for human-like hands to reach the goal of general manipulation. However, approaches today don’t use tactile sensing or use specific architectures per tactile task. Can 1 model improve many tactile tasks? 🌟Introducing Sparsh-skin: https://t.co/DgTq9OPMap 1/6

1

3

11

Mustafa Mukadam

@mukadammh

1 year

Touch perception holds the key to unlock robot dexterity Our new @SciRobotics work shows how to fuse tactile & vision, track pose+shape of novel objects during dexterous manipulation https://t.co/TNlaBjB6Ra It's a culmination of our work over the last 4 years, see @Suddhus 🧵⬇

Sudharshan Suresh

@Suddhus

1 year

For robot dexterity, a missing piece is general, robust perception. Our new @SciRobotics work combines multimodal sensing with neural representations to perceive novel objects in-hand. 🎲 Featured on the cover of the November issue! #ScienceRoboticsResearch 🧵1/9

1

12

81

Julen Urain

@robotgradient

1 year

On the other side, SE(3) Flow Matching is a prefered option to SE(3) Diffusion. It is deterministic and its build on straight paths over the geodesics, leading to a much simpler generation/optimization paths.

1

0

Julen Urain

@robotgradient

1 year

In our intepretation, Invariant networks should be preferred over Equivariant networks as long as you can achieve same results (i.e. equivariant action generation). Imposing Invariance over your network is simpler and allows using most of the common functions (ReLU, Linear...)

1

0

Julen Urain

@robotgradient

1 year

When we were working in SE(3)-DiffusionFields, we were not aware how far this could be extended. Our new work shows that: - SE(3) Flow Matching is a simple yet powerful alternative to SE(3) Diffusion for robotics. - We can use Invariant network for equivariant action generation.

Niklas Funk

@n_w_funk

1 year

I am excited to share our recent work on "ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching". The work presents a novel policy class combining Flow Matching with SE(3) Invariant Transformers for fast, equivariant, and expressive

1

9

57

Snehal Jauhri

@SnehalJauhri

1 year

Perfect start to the #CoRL2024 week! Was a pleasure organizing the NextGen Robot Learning Symposium at @TUDarmstadt with @firasalhafez @GeorgiaChal Thanks to the speakers for the great talks! @YunzhuLiYZ @NimaFazeli7 @Dian_Wang_ @HaojieHuang13 @Vikashplus @ehsanik @Oliver_Kroemer

1

7

61

Julen Urain

@robotgradient

1 year

Okey, not super impressive! BUT, this plot gives us hope! we found that the agent keeps improving on unseen songs with more and more demonstrations, so we are hopeful that given the large datasets available on the Internet, our agent will keep improving in the future 🎹

0

6

Julen Urain

@robotgradient

1 year

TASK 2: take all these RL policies and distill them (BC) into a Diffusion Policy. We use a hierarchical policy. The top layer policy outputs desired fingertip motions while the bottom layer policy generates the configuration space actions. Performance in unseen songs 👇

2

0

8

Julen Urain

@robotgradient

1 year

We train individual RL single-song policies, which we leverage to generate the desired observation-action pairs! Observe that using fingertip demonstrations leads to human-like motions, while the absence of them leads to policies behaving in unexpected styles.

1

0

8

Julen Urain

@robotgradient

1 year

We first extract both the FINGERTIP motion trajectories and a TASK trajectory (the song), which informs the task to be solved, from the videos. Then, we apply residual RL, using the fingertip motion as the nominal behaviour and the task trajectory as reward

1

0

5

Julen Urain

@robotgradient

1 year

So, what is the challenge? Unlike teleoperation data, video data lacks the action info needed to determine what control signals should be applied to the robot to accomplish the observed task. TASK 1: Infer the actions the robot should do to match the videos. How? RL + IL

1

0

5

Julen Urain

@robotgradient

1 year

@ChengQian0112 , with the collaboration of @kevin_zakka and @Jan_R_Peters introduces a simple framework to learn highly dexterous manipulation skills from videos. Given the large amount of video data, we can learn a generalist policy that can play ANY song.

1

0

5

Julen Urain

@robotgradient

1 year

YouTube is a LARGE dataset of demonstration videos to train Generalist robot agents, but lacks action data. How can we learn DEXTEROUS skills from them? In #CoRL2024, we explore the problem of learning a Generalist Piano Playing agent from YouTube videos. https://t.co/nRRy3hdqkL

6

43

316

Julen Urain

@robotgradient

1 year

Generalization is an essential property to make robots perform well in novel out-of-distribution contexts. How can we improve the generalization of our models? We explore strategies such as Composition, Extracting meaningful features, or observations and actions grounding.

0

1

4

Julen Urain

@robotgradient

1 year

Should you generate configuration space actions or task space actions? Should you generate trajectories or keyposes? Is it better to generate position actions or velocity actions? We explore different action representation modalities and highlight when to use each.

1

0

4

Julen Urain

@robotgradient

1 year

In terms, of generative models, we explore Diffusion Models, Energy-Based Models, Action Values Maps or GPT-Style models to name a few. We compare the benefits and pitfalls of each generative model and suggest situations in which each model would provide the best.

1

0

3