Lars Ankile
@larsankile
Followers
525
Following
952
Media
22
Statuses
121
ML for robotics.
Palo Alto, CA
Joined December 2012
How can we enable finetuning of humanoid manipulation policies, directly in the real world? In our new paper, Residual Off-Policy RL for Finetuning BC Policies, we demonstrate real-world RL on a bimanual humanoid with 5-fingered hands (29 DoF) and improve pre-trained policies
8
50
230
Lots of limitations remain. The model is still not accurate enough for general evaluation and improvement. But, very cool to see the progress since 2016 :) ( https://t.co/u3xhjrgtvJ)
1
2
18
Offline RL is not RL. RL is about interaction. No interaction, no RL.
64
22
467
I think people underestimate this: most of the advances we're seeing wouldn’t exist without open-source + open research. Take away PyTorch or HuggingFace and the whole thing collapses. AI the most awe-inspiring collective effort of our time.
periodic ❤️ open-source! for example, we’ve been collaborating with the @PyTorch team to build the highest MFU gpt-oss training implementation (includes thinky sinky flexattn) here’s a few SFT runs of gpt-oss-20b & 120b, where i get ~24% MFU for 20b and ~8% for 120b
2
6
36
@SteveTod1998 @rocky_duan @GuanyaShi @pabbeel 11/ Many thanks to everyone else who also helped along the way! @Cinnabar233 for technical discussions & help with baselines, @younggyoseo for discussions on real-world applications, @carlo_sferrazza & @RchalYang for manuscript feedback. Benjamin Colby, Hassan Farooq & @SOTA_kke
0
0
7
10/ This was an exciting project that I worked on during my internship at Amazon FAR, and would not have been possible without the fantastic team and collaborators: @SteveTod1998 @rocky_duan @GuanyaShi @pabbeel and Anusha Nagabandi! The autonomy and resources you get as an
arxiv.org
Recent advances in behavior cloning (BC) have enabled impressive visuomotor control policies. However, these approaches are limited by the quality of human demonstrations, the manual effort...
1
0
2
9/ Related and interesting work: There is a lot of exciting work in the realm of RL for finetuning of BC policies, RL for manipulation, long-horizon and sparse-reward tasks, and real-world RL. Here are some notable ones that were top of mind during this project, in no particular
arxiv.org
Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data,...
2
0
8
8/ Limitations and future work: The regularization provided by the frozen BC base model offers an exploration prior and stability, but simultaneously restricts improvements to the "neighborhood" of the initial data distribution. Interestingly, while this often manifests as
arxiv.org
We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a...
1
0
4
7/ Real-world task 2: PackageHandover results:Base: 23% → After 343 episodes (~76 min): 64% This task involves bimanual coordination, a deformable object, and a long horizon. The base model often fails by misgrasping the object initially or by being imprecise in the handover.
1
0
6
6/ Real-world task 1: WoollyBallPnP Base: 14% → After 134 episodes (~15 min): 64% This task is a relatively simple pick-and-place task, but the base policy fails surprisingly often due to the precision needed not to have the ball slip out of the grasp.
1
1
5
5/ Real-world validation on our Vega humanoid with 29 Degrees of Freedom (DoF): WoollyBallPnP: 14% → 64% (134 rollouts, ~15 min) PackageHandover: 23% → 64% (343 rollouts, ~76 min) No sim2real needed or used – pure real-world learning!
1
0
4
4/ That insight is similar to that of prior work, like ResiP ( https://t.co/iNj4hcSOS6). In this work, the primary challenge is to enable RL to operate directly in the real world. We combine a set of design decisions and show that, for a range of manipulation tasks from Robomimic
1
1
8
3/ If we treat the pre-trained BC model as a black box and learn to correct the predicted actions at every step (within the chunk), we can combine the strength of both action-chunked BC and standard off-policy RL approaches. That is, we can both leverage the effectiveness of
1
0
4
2/ Project page: https://t.co/T6IkqygyvV Behavior Cloning (BC) can achieve a success rate of around 80% for many tasks relatively straightforwardly, but then often plateaus. Even with hours and hours of human demonstrations, BC policies fail in subtle ways, such as slightly
1
1
6
Is this a skill issue? I'm hearing from eg @tszzl that no one at OpenAI codes anymore, they just kick off Codex agents, meanwhile I'm out here watching codex spend almost 10 minutes (it's still going) and ~160k tokens not figuring how to set up a repo with uv..
2
1
15
It's almost time for #CoRL 2025! A reminder that we're hosting the Data in Robotics workshop this Saturday Sept 27th. We have a packed schedule and are also attempting to livestream the event for those who can't attend in person.
3
9
67
Q-learning is not yet scalable https://t.co/hoYUdAAeGZ I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
37
194
1K
How do we unlock the full dexterity of robot hands with data, even beyond what teleoperation can achieve? DEXOP captures natural human manipulation with full-hand tactile & proprio sensing, plus direct force feedback to users, without needing a robot👉 https://t.co/rjfQ9nzofm
31
280
1K
Like, no, this 5 km is not a new record, and why do I have to wait 2 weeks for a VO2 max estimate, I literally just gave you 6 years of running data…
0
0
0
anyone working at @GarminFitness who can help me properly integrate the 6 years of Polar data I just imported into Garmin?
2
0
0