Andrew Wagenmaker Profile
Andrew Wagenmaker

@ajwagenmaker

Followers
248
Following
0
Media
9
Statuses
19

Postdoc @ UC Berkeley with Sergey Levine | PhD @ UW.

Joined May 2025
Don't wanna be here? Send us removal request.
@ajwagenmaker
Andrew Wagenmaker
10 days
Fun collaboration with @mitsuhiko_nm, @yunchuzh, @seohong_park, Waleed Yagoub, Anusha Nagabandi, @abhishekunique7, @svlevine! (11/n). Please see the paper and website for additional results!.Website: Paper:
3
0
12
@ajwagenmaker
Andrew Wagenmaker
10 days
DSRL is also a competitive offline RL procedure: we find that first training a diffusion/flow policy on an offline dataset, then applying DSRL to steer it to high-reward behavior using the offline data performs on par with state-of-the-art offline RL methods on OGBench. (10/n)
Tweet media one
1
0
8
@ajwagenmaker
Andrew Wagenmaker
10 days
In simulation, we find that DSRL substantially outperforms all existing approaches to improving diffusion policies online on benchmarks such as Robomimic. (9/n)
Tweet media one
1
0
9
@ajwagenmaker
Andrew Wagenmaker
10 days
Noise-Aliased DSRL trains two Q-functions, one on the original action space via TD learning, and one on the noise action space by distilling the first Q function. This allows learning from offline data and improves online sample efficiency by as much as 2x. (8/n).
1
0
7
@ajwagenmaker
Andrew Wagenmaker
10 days
DSRL can, in principle, be instantiated with any RL algorithm. We propose, however, a SAC-based variant, Noise-Aliased DSRL, that takes advantage of a diffusion policy's tendency to map different noise to similar actions, and allows for fully off-policy training. (7/n)
1
0
9
@ajwagenmaker
Andrew Wagenmaker
10 days
Real-world DSRL training is highly stable. As any "action" played by the noise-space RL policy is just initial noise for the denoising process, even early in training the denoised actions look like actions from a BC-trained policy, rather than an unconverged RL policy. (6/n)
1
1
14
@ajwagenmaker
Andrew Wagenmaker
10 days
Uncut training timelapse of DSRL on WidowX pick-and-place task. (5/n)
1
0
11
@ajwagenmaker
Andrew Wagenmaker
10 days
We also apply DSRL to state-of-the-art flow-based generalist policies, in particular pi0 from Physical Intelligence. DSRL is able to improve pi0 in real-world deployment, on some tasks taking success from 25% to 90% in <90 minutes of online training. (4/n)
1
0
13
@ajwagenmaker
Andrew Wagenmaker
10 days
DSRL enables efficient improvement of real-world diffusion policies for robotic control. We apply it to several different robot embodiments, and find that it is able to improve performance from <30% success to >90% in anywhere from 30-60 minutes of online training. (3/n)
1
1
12
@ajwagenmaker
Andrew Wagenmaker
10 days
DSRL trains a lightweight policy via RL to select input noise to the diffusion policy's denoising process, steering it to desired behaviors. We find this leads to very sample-efficient improvement, and avoids challenges typically encountered with diffusion policies + RL. (2/n)
1
3
21
@ajwagenmaker
Andrew Wagenmaker
10 days
Diffusion policies have demonstrated impressive performance in robot control, yet are difficult to improve online when 0-shot performance isn’t enough. To address this challenge, we introduce DSRL: Diffusion Steering via Reinforcement Learning. (1/n).
8
59
289