abhishekunique7 Profile Banner
Abhishek Gupta Profile
Abhishek Gupta

@abhishekunique7

Followers
8K
Following
840
Media
179
Statuses
505

Assistant Professor at University of Washington. I like robots, and reinforcement learning. Previously: post-doc at MIT, PhD at Berkeley

Seattle, WA
Joined February 2012
Don't wanna be here? Send us removal request.
@abhishekunique7
Abhishek Gupta
10 days
So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to *steer* pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real
5
24
189
@abhishekunique7
Abhishek Gupta
3 days
RT @natashajaques: In our latest paper, we discovered a surprising result: training LLMs with self-play reinforcement learning on zero-sum….
0
58
0
@abhishekunique7
Abhishek Gupta
10 days
Ok what are the takeaways?. 1. Wrap a base diffusion policy into the environment and just control initial noise to steer it. Yields a frustratingly simple, scalable and efficient algorithm that retains good exploration while enabling sample efficient improvement. 2. Train two Q.
0
0
3
@abhishekunique7
Abhishek Gupta
10 days
Thanks to noise-aliasing - DSRL is also a competitive offline RL procedure: first training a diffusion/flow policy on an entire offline dataset, then applying DSRL to steer it to high-reward behavior using the offline data performs on par with state-of-the-art offline RL methods
Tweet media one
1
0
0
@abhishekunique7
Abhishek Gupta
10 days
What are the numbers? In simulation, we find that DSRL substantially outperforms all existing approaches to finetuning diffusion policies online on benchmarks such as Robomimic and Mujoco tasks. (8/10)
Tweet media one
1
0
1
@abhishekunique7
Abhishek Gupta
10 days
What’s the catch? We’ve changed the action space of the RL policy from the original actions a to initial noise z. The challenge is that offline data or interventions are collected in the (s, a, s’, r) space, not in the (s, z, s’, r) space. To allow this data to still be useful,
3
0
0
@abhishekunique7
Abhishek Gupta
10 days
Why does this actually help? Since any "action" played by the noise-space RL policy is just initial noise for the denoising process, even early in training the denoised actions look like actions from a BC-trained policy, rather than an unconverged RL policy. This makes
Tweet media one
Tweet media two
1
0
1
@abhishekunique7
Abhishek Gupta
10 days
Can we understand what’s going on a little more visually? Watching a timelapse of DSRL on a WidowX robot tasked with picking and placing an object shows the coherence of exploration and the efficiency of learning! (5/10)
1
0
1
@abhishekunique7
Abhishek Gupta
10 days
Ok but can we steer policies that we didn’t actually pre-train ourselves? To test this, we applied DSRL to pi0, a state-of-the-art flow-based generalist policy from @physical_int. DSRL is able to improve pi0 in real-world deployment, on some tasks taking success from 25% to 90%
1
0
2
@abhishekunique7
Abhishek Gupta
10 days
Why does this matter? - 1) retains base policy exploration, 2) enables super efficient real-world RL, efficient enough to put on a robot! Let’s look at some examples of efficient improvement of real-world diffusion policies for robotic control. We applied DSRL on several
1
0
2
@abhishekunique7
Abhishek Gupta
10 days
What’s the key idea? Don’t touch the base policy, but instead train a lightweight policy via RL to select the initial noise in a diffusion (or flow) policy's denoising process. This initial noise modulates the behaviors, guiding it towards desired behaviors. Think of it as -
1
0
3
@abhishekunique7
Abhishek Gupta
10 days
RT @ajwagenmaker: Diffusion policies have demonstrated impressive performance in robot control, yet are difficult to improve online when 0-….
0
59
0
@abhishekunique7
Abhishek Gupta
14 days
I'm sadly unable to be at #RSS2025 this year, but my students @prodarhan, @chuning_zhu and @marceltornev will be! Find them presenting some exciting work today, 6/21: . 1) @chuning_zhu will present Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large
Tweet media one
Tweet media two
0
5
53
@abhishekunique7
Abhishek Gupta
14 days
RT @pranav_atreya: In robotics benchmarks are rarely shared. New eval setups are created for each new project, a stark difference from eval….
0
21
0
@abhishekunique7
Abhishek Gupta
14 days
Check out some of our new work on distributed robot evaluation led by @KarlPertsch, @pranav_atreya and @tonyh_lee! Hopefully folks can contribute, and help us take a step towards systematic and standardized empiricism in robot learning! :). Also check out some of the fun sim eval.
@KarlPertsch
Karl Pertsch
15 days
We’re releasing the RoboArena today!🤖🦾. Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help!. We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :).🧵
0
6
30
@abhishekunique7
Abhishek Gupta
15 days
RT @yunchuzh: How should a robot perceive the world? What kind of visual representation leads to robust visuomotor policy learning for robo….
0
26
0
@abhishekunique7
Abhishek Gupta
15 days
Check out @yunchuzh's new work on automatically selecting keypoints as a representation for super robust policy learning!.
@yunchuzh
Yunchu
16 days
How should a robot perceive the world? What kind of visual representation leads to robust visuomotor policy learning for robotics?. Policies trained on raw images are often fragile—easily broken by lighting, clutter, or object variations—making it challenging to deploy policies
1
0
32
@abhishekunique7
Abhishek Gupta
15 days
Go read our paper to get into the guts of things :). Paper: Website: Here’s some higher level points, I learned from this paper. 1. Keypoints are surprisingly robust. Future policy representations may perhaps look a little.
0
0
2
@abhishekunique7
Abhishek Gupta
15 days
Here’s a cool bonus - turns out keypoint trackers are pretty good at bridging large visual gaps. So the cool thing is that visuomotor policies trained in simulation, can transfer over to the real world; even when the simulation looks pretty bad! This is impactful because it
1
1
3
@abhishekunique7
Abhishek Gupta
15 days
But we care about policies! Turns out, using the same imitation learning setup as standard behavior cloning, but just changing the input visual representation to the keypoints selected by ATK yields policies that are robust to significant visual variations—lighting, background,
1
0
1