Lars Ankile @larsankile X Profile

Lars Ankile

@larsankile

Followers

525

Following

952

Media

22

Statuses

121

ML for robotics.

https://t.co/C25jrLiYPz

Palo Alto, CA

Joined December 2012

Don't wanna be here? Send us removal request.

Lars Ankile

@larsankile

1 month

How can we enable finetuning of humanoid manipulation policies, directly in the real world? In our new paper, Residual Off-Policy RL for Finetuning BC Policies, we demonstrate real-world RL on a bimanual humanoid with 5-fingered hands (29 DoF) and improve pre-trained policies

8

50

230

Chelsea Finn

@chelseabfinn

15 days

Lots of limitations remain. The model is still not accurate enough for general evaluation and improvement. But, very cool to see the progress since 2016 :) ( https://t.co/u3xhjrgtvJ)

1

2

18

Joseph Suarez 🐡

@jsuarez5341

1 month

Offline RL is not RL. RL is about interaction. No interaction, no RL.

64

22

467

vmoens

@VincentMoens

1 month

I think people underestimate this: most of the advances we're seeing wouldn’t exist without open-source + open research. Take away PyTorch or HuggingFace and the whole thing collapses. AI the most awe-inspiring collective effort of our time.

Rohan Pandey

@khoomeik

1 month

periodic ❤️ open-source! for example, we’ve been collaborating with the @PyTorch team to build the highest MFU gpt-oss training implementation (includes thinky sinky flexattn) here’s a few SFT runs of gpt-oss-20b & 120b, where i get ~24% MFU for 20b and ~8% for 120b

2

6

36

Lars Ankile

@larsankile

1 month

@SteveTod1998 @rocky_duan @GuanyaShi @pabbeel 11/ Many thanks to everyone else who also helped along the way! @Cinnabar233 for technical discussions & help with baselines, @younggyoseo for discussions on real-world applications, @carlo_sferrazza & @RchalYang for manuscript feedback. Benjamin Colby, Hassan Farooq & @SOTA_kke

0

7

Lars Ankile

@larsankile

1 month

10/ This was an exciting project that I worked on during my internship at Amazon FAR, and would not have been possible without the fantastic team and collaborators: @SteveTod1998 @rocky_duan @GuanyaShi @pabbeel and Anusha Nagabandi! The autonomy and resources you get as an

arxiv.org

Recent advances in behavior cloning (BC) have enabled impressive visuomotor control policies. However, these approaches are limited by the quality of human demonstrations, the manual effort...

1

0

2

Lars Ankile

@larsankile

1 month

9/ Related and interesting work: There is a lot of exciting work in the realm of RL for finetuning of BC policies, RL for manipulation, long-horizon and sparse-reward tasks, and real-world RL. Here are some notable ones that were top of mind during this project, in no particular

arxiv.org

Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data,...

2

0

8

Lars Ankile

@larsankile

1 month

8/ Limitations and future work: The regularization provided by the frozen BC base model offers an exploration prior and stability, but simultaneously restricts improvements to the "neighborhood" of the initial data distribution. Interestingly, while this often manifests as

arxiv.org

We study the problem of training and fine-tuning expressive policies with online reinforcement learning (RL) given an offline dataset. Training expressive policy classes with online RL present a...

1

0

4

Lars Ankile

@larsankile

1 month

7/ Real-world task 2: PackageHandover results:Base: 23% → After 343 episodes (~76 min): 64% This task involves bimanual coordination, a deformable object, and a long horizon. The base model often fails by misgrasping the object initially or by being imprecise in the handover.

1

0

6

Lars Ankile

@larsankile

1 month

6/ Real-world task 1: WoollyBallPnP Base: 14% → After 134 episodes (~15 min): 64% This task is a relatively simple pick-and-place task, but the base policy fails surprisingly often due to the precision needed not to have the ball slip out of the grasp.

1

5

Lars Ankile

@larsankile

1 month

5/ Real-world validation on our Vega humanoid with 29 Degrees of Freedom (DoF): WoollyBallPnP: 14% → 64% (134 rollouts, ~15 min) PackageHandover: 23% → 64% (343 rollouts, ~76 min) No sim2real needed or used – pure real-world learning!

1

0

4

Lars Ankile

@larsankile

1 month

4/ That insight is similar to that of prior work, like ResiP ( https://t.co/iNj4hcSOS6). In this work, the primary challenge is to enable RL to operate directly in the real world. We combine a set of design decisions and show that, for a range of manipulation tasks from Robomimic

1

8

Lars Ankile

@larsankile

1 month

3/ If we treat the pre-trained BC model as a black box and learn to correct the predicted actions at every step (within the chunk), we can combine the strength of both action-chunked BC and standard off-policy RL approaches. That is, we can both leverage the effectiveness of

1

0

4

Lars Ankile

@larsankile

1 month

2/ Project page: https://t.co/T6IkqygyvV Behavior Cloning (BC) can achieve a success rate of around 80% for many tasks relatively straightforwardly, but then often plateaus. Even with hours and hours of human demonstrations, BC policies fail in subtle ways, such as slightly

1

6

Lars Ankile

@larsankile

1 month

Is this a skill issue? I'm hearing from eg @tszzl that no one at OpenAI codes anymore, they just kick off Codex agents, meanwhile I'm out here watching codex spend almost 10 minutes (it's still going) and ~160k tokens not figuring how to set up a repo with uv..

2

1

15

Joey Hejna

@JoeyHejna

1 month

It's almost time for #CoRL 2025! A reminder that we're hosting the Data in Robotics workshop this Saturday Sept 27th. We have a packed schedule and are also attempting to livestream the event for those who can't attend in person.

3

9

67

Seohong Park

@seohong_park

5 months

Q-learning is not yet scalable https://t.co/hoYUdAAeGZ I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

37

194

1K

Hao-Shu Fang

@haoshu_fang

2 months

How do we unlock the full dexterity of robot hands with data, even beyond what teleoperation can achieve? DEXOP captures natural human manipulation with full-hand tactile & proprio sensing, plus direct force feedback to users, without needing a robot👉 https://t.co/rjfQ9nzofm

31

280

1K

Lars Ankile

@larsankile

2 months

Like, no, this 5 km is not a new record, and why do I have to wait 2 weeks for a VO2 max estimate, I literally just gave you 6 years of running data…

0

Lars Ankile

@larsankile

2 months

anyone working at @GarminFitness who can help me properly integrate the 6 years of Polar data I just imported into Garmin?

2

0