Lars Ankile Profile
Lars Ankile

@larsankile

Followers
525
Following
952
Media
22
Statuses
121

ML for robotics.

Palo Alto, CA
Joined December 2012
Don't wanna be here? Send us removal request.
@larsankile
Lars Ankile
1 month
How can we enable finetuning of humanoid manipulation policies, directly in the real world? In our new paper, Residual Off-Policy RL for Finetuning BC Policies, we demonstrate real-world RL on a bimanual humanoid with 5-fingered hands (29 DoF) and improve pre-trained policies
8
50
230
@chelseabfinn
Chelsea Finn
15 days
Lots of limitations remain. The model is still not accurate enough for general evaluation and improvement. But, very cool to see the progress since 2016 :) ( https://t.co/u3xhjrgtvJ)
1
2
18
@jsuarez5341
Joseph Suarez 🐡
1 month
Offline RL is not RL. RL is about interaction. No interaction, no RL.
64
22
467
@VincentMoens
vmoens
1 month
I think people underestimate this: most of the advances we're seeing wouldn’t exist without open-source + open research. Take away PyTorch or HuggingFace and the whole thing collapses. AI the most awe-inspiring collective effort of our time.
@khoomeik
Rohan Pandey
1 month
periodic ❤️ open-source! for example, we’ve been collaborating with the @PyTorch team to build the highest MFU gpt-oss training implementation (includes thinky sinky flexattn) here’s a few SFT runs of gpt-oss-20b & 120b, where i get ~24% MFU for 20b and ~8% for 120b
2
6
36
@larsankile
Lars Ankile
1 month
@SteveTod1998 @rocky_duan @GuanyaShi @pabbeel 11/ Many thanks to everyone else who also helped along the way! @Cinnabar233 for technical discussions & help with baselines, @younggyoseo for discussions on real-world applications, @carlo_sferrazza & @RchalYang for manuscript feedback. Benjamin Colby, Hassan Farooq & @SOTA_kke
0
0
7
@larsankile
Lars Ankile
1 month
10/ This was an exciting project that I worked on during my internship at Amazon FAR, and would not have been possible without the fantastic team and collaborators: @SteveTod1998 @rocky_duan @GuanyaShi @pabbeel and Anusha Nagabandi! The autonomy and resources you get as an
Tweet card summary image
arxiv.org
Recent advances in behavior cloning (BC) have enabled impressive visuomotor control policies. However, these approaches are limited by the quality of human demonstrations, the manual effort...
1
0
2
@larsankile
Lars Ankile
1 month
9/ Related and interesting work: There is a lot of exciting work in the realm of RL for finetuning of BC policies, RL for manipulation, long-horizon and sparse-reward tasks, and real-world RL. Here are some notable ones that were top of mind during this project, in no particular
Tweet card summary image
arxiv.org
Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data,...
2
0
8
@larsankile
Lars Ankile
1 month
8/ Limitations and future work: The regularization provided by the frozen BC base model offers an exploration prior and stability, but simultaneously restricts improvements to the "neighborhood" of the initial data distribution. Interestingly, while this often manifests as
Tweet card summary image
arxiv.org
We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a...
1
0
4
@larsankile
Lars Ankile
1 month
7/ Real-world task 2: PackageHandover results:Base: 23% → After 343 episodes (~76 min): 64% This task involves bimanual coordination, a deformable object, and a long horizon. The base model often fails by misgrasping the object initially or by being imprecise in the handover.
1
0
6
@larsankile
Lars Ankile
1 month
6/ Real-world task 1: WoollyBallPnP Base: 14% → After 134 episodes (~15 min): 64% This task is a relatively simple pick-and-place task, but the base policy fails surprisingly often due to the precision needed not to have the ball slip out of the grasp.
1
1
5
@larsankile
Lars Ankile
1 month
5/ Real-world validation on our Vega humanoid with 29 Degrees of Freedom (DoF): WoollyBallPnP: 14% → 64% (134 rollouts, ~15 min) PackageHandover: 23% → 64% (343 rollouts, ~76 min) No sim2real needed or used – pure real-world learning!
1
0
4
@larsankile
Lars Ankile
1 month
4/ That insight is similar to that of prior work, like ResiP ( https://t.co/iNj4hcSOS6). In this work, the primary challenge is to enable RL to operate directly in the real world. We combine a set of design decisions and show that, for a range of manipulation tasks from Robomimic
1
1
8
@larsankile
Lars Ankile
1 month
3/ If we treat the pre-trained BC model as a black box and learn to correct the predicted actions at every step (within the chunk), we can combine the strength of both action-chunked BC and standard off-policy RL approaches. That is, we can both leverage the effectiveness of
1
0
4
@larsankile
Lars Ankile
1 month
2/ Project page: https://t.co/T6IkqygyvV Behavior Cloning (BC) can achieve a success rate of around 80% for many tasks relatively straightforwardly, but then often plateaus. Even with hours and hours of human demonstrations, BC policies fail in subtle ways, such as slightly
1
1
6
@larsankile
Lars Ankile
1 month
Is this a skill issue? I'm hearing from eg @tszzl that no one at OpenAI codes anymore, they just kick off Codex agents, meanwhile I'm out here watching codex spend almost 10 minutes (it's still going) and ~160k tokens not figuring how to set up a repo with uv..
2
1
15
@JoeyHejna
Joey Hejna
1 month
It's almost time for #CoRL 2025! A reminder that we're hosting the Data in Robotics workshop this Saturday Sept 27th. We have a packed schedule and are also attempting to livestream the event for those who can't attend in person.
3
9
67
@seohong_park
Seohong Park
5 months
Q-learning is not yet scalable https://t.co/hoYUdAAeGZ I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
37
194
1K
@haoshu_fang
Hao-Shu Fang
2 months
How do we unlock the full dexterity of robot hands with data, even beyond what teleoperation can achieve? DEXOP captures natural human manipulation with full-hand tactile & proprio sensing, plus direct force feedback to users, without needing a robot👉 https://t.co/rjfQ9nzofm
31
280
1K
@larsankile
Lars Ankile
2 months
Like, no, this 5 km is not a new record, and why do I have to wait 2 weeks for a VO2 max estimate, I literally just gave you 6 years of running data…
0
0
0
@larsankile
Lars Ankile
2 months
anyone working at @GarminFitness who can help me properly integrate the 6 years of Polar data I just imported into Garmin?
2
0
0