Andrew Wagenmaker @ajwagenmaker X Profile

Andrew Wagenmaker

@ajwagenmaker

Followers

248

Following

0

Media

9

Statuses

19

Postdoc @ UC Berkeley with Sergey Levine | PhD @ UW.

Joined May 2025

Don't wanna be here? Send us removal request.

Andrew Wagenmaker

@ajwagenmaker

10 days

Fun collaboration with @mitsuhiko_nm, @yunchuzh, @seohong_park, Waleed Yagoub, Anusha Nagabandi, @abhishekunique7, @svlevine! (11/n). Please see the paper and website for additional results!.Website: Paper:

3

0

12

Andrew Wagenmaker

@ajwagenmaker

10 days

DSRL is also a competitive offline RL procedure: we find that first training a diffusion/flow policy on an offline dataset, then applying DSRL to steer it to high-reward behavior using the offline data performs on par with state-of-the-art offline RL methods on OGBench. (10/n)

1

0

8

Andrew Wagenmaker

@ajwagenmaker

10 days

In simulation, we find that DSRL substantially outperforms all existing approaches to improving diffusion policies online on benchmarks such as Robomimic. (9/n)

1

0

9

Andrew Wagenmaker

@ajwagenmaker

10 days

Noise-Aliased DSRL trains two Q-functions, one on the original action space via TD learning, and one on the noise action space by distilling the first Q function. This allows learning from offline data and improves online sample efficiency by as much as 2x. (8/n).

1

0

7

Andrew Wagenmaker

@ajwagenmaker

10 days

DSRL can, in principle, be instantiated with any RL algorithm. We propose, however, a SAC-based variant, Noise-Aliased DSRL, that takes advantage of a diffusion policy's tendency to map different noise to similar actions, and allows for fully off-policy training. (7/n)

1

0

9

Andrew Wagenmaker

@ajwagenmaker

10 days

Real-world DSRL training is highly stable. As any "action" played by the noise-space RL policy is just initial noise for the denoising process, even early in training the denoised actions look like actions from a BC-trained policy, rather than an unconverged RL policy. (6/n)

1

14

Andrew Wagenmaker

@ajwagenmaker

10 days

Uncut training timelapse of DSRL on WidowX pick-and-place task. (5/n)

1

0

11

Andrew Wagenmaker

@ajwagenmaker

10 days

We also apply DSRL to state-of-the-art flow-based generalist policies, in particular pi0 from Physical Intelligence. DSRL is able to improve pi0 in real-world deployment, on some tasks taking success from 25% to 90% in <90 minutes of online training. (4/n)

1

0

13

Andrew Wagenmaker

@ajwagenmaker

10 days

DSRL enables efficient improvement of real-world diffusion policies for robotic control. We apply it to several different robot embodiments, and find that it is able to improve performance from <30% success to >90% in anywhere from 30-60 minutes of online training. (3/n)

1

12

Andrew Wagenmaker

@ajwagenmaker

10 days

DSRL trains a lightweight policy via RL to select input noise to the diffusion policy's denoising process, steering it to desired behaviors. We find this leads to very sample-efficient improvement, and avoids challenges typically encountered with diffusion policies + RL. (2/n)

1

3

21

Andrew Wagenmaker

@ajwagenmaker

10 days

Diffusion policies have demonstrated impressive performance in robot control, yet are difficult to improve online when 0-shot performance isn’t enough. To address this challenge, we introduce DSRL: Diffusion Steering via Reinforcement Learning. (1/n).

8

59

289