Abhishek Gupta @abhishekunique7 X Profile

Abhishek Gupta

@abhishekunique7

Followers

8K

Following

840

Media

179

Statuses

505

Assistant Professor at University of Washington. I like robots, and reinforcement learning. Previously: post-doc at MIT, PhD at Berkeley

Seattle, WA

Joined February 2012

Don't wanna be here? Send us removal request.

Abhishek Gupta

@abhishekunique7

10 days

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to *steer* pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real

5

24

189

Abhishek Gupta

@abhishekunique7

3 days

RT @natashajaques: In our latest paper, we discovered a surprising result: training LLMs with self-play reinforcement learning on zero-sum….

0

58

0

Abhishek Gupta

@abhishekunique7

10 days

Ok what are the takeaways?. 1. Wrap a base diffusion policy into the environment and just control initial noise to steer it. Yields a frustratingly simple, scalable and efficient algorithm that retains good exploration while enabling sample efficient improvement. 2. Train two Q.

0

3

Abhishek Gupta

@abhishekunique7

10 days

Thanks to noise-aliasing - DSRL is also a competitive offline RL procedure: first training a diffusion/flow policy on an entire offline dataset, then applying DSRL to steer it to high-reward behavior using the offline data performs on par with state-of-the-art offline RL methods

1

0

Abhishek Gupta

@abhishekunique7

10 days

What are the numbers? In simulation, we find that DSRL substantially outperforms all existing approaches to finetuning diffusion policies online on benchmarks such as Robomimic and Mujoco tasks. (8/10)

1

0

1

Abhishek Gupta

@abhishekunique7

10 days

What’s the catch? We’ve changed the action space of the RL policy from the original actions a to initial noise z. The challenge is that offline data or interventions are collected in the (s, a, s’, r) space, not in the (s, z, s’, r) space. To allow this data to still be useful,

3

0

Abhishek Gupta

@abhishekunique7

10 days

Why does this actually help? Since any "action" played by the noise-space RL policy is just initial noise for the denoising process, even early in training the denoised actions look like actions from a BC-trained policy, rather than an unconverged RL policy. This makes

1

0

1

Abhishek Gupta

@abhishekunique7

10 days

Can we understand what’s going on a little more visually? Watching a timelapse of DSRL on a WidowX robot tasked with picking and placing an object shows the coherence of exploration and the efficiency of learning! (5/10)

1

0

1

Abhishek Gupta

@abhishekunique7

10 days

Ok but can we steer policies that we didn’t actually pre-train ourselves? To test this, we applied DSRL to pi0, a state-of-the-art flow-based generalist policy from @physical_int. DSRL is able to improve pi0 in real-world deployment, on some tasks taking success from 25% to 90%

1

0

2

Abhishek Gupta

@abhishekunique7

10 days

Why does this matter? - 1) retains base policy exploration, 2) enables super efficient real-world RL, efficient enough to put on a robot! Let’s look at some examples of efficient improvement of real-world diffusion policies for robotic control. We applied DSRL on several

1

0

2

Abhishek Gupta

@abhishekunique7

10 days

What’s the key idea? Don’t touch the base policy, but instead train a lightweight policy via RL to select the initial noise in a diffusion (or flow) policy's denoising process. This initial noise modulates the behaviors, guiding it towards desired behaviors. Think of it as -

1

0

3

Abhishek Gupta

@abhishekunique7

10 days

RT @ajwagenmaker: Diffusion policies have demonstrated impressive performance in robot control, yet are difficult to improve online when 0-….

0

59

0

Abhishek Gupta

@abhishekunique7

14 days

I'm sadly unable to be at #RSS2025 this year, but my students @prodarhan, @chuning_zhu and @marceltornev will be! Find them presenting some exciting work today, 6/21: . 1) @chuning_zhu will present Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large

0

5

53

Abhishek Gupta

@abhishekunique7

14 days

RT @pranav_atreya: In robotics benchmarks are rarely shared. New eval setups are created for each new project, a stark difference from eval….

0

21

0

Abhishek Gupta

@abhishekunique7

14 days

Check out some of our new work on distributed robot evaluation led by @KarlPertsch, @pranav_atreya and @tonyh_lee! Hopefully folks can contribute, and help us take a step towards systematic and standardized empiricism in robot learning! :). Also check out some of the fun sim eval.

Karl Pertsch

@KarlPertsch

15 days

We’re releasing the RoboArena today!🤖🦾. Fair & scalable evaluation is a major bottleneck for research on generalist policies. We’re hoping that RoboArena can help!. We provide data, model code & sim evals for debugging! Submit your policies today and join the leaderboard! :).🧵

0

6

30

Abhishek Gupta

@abhishekunique7

15 days

RT @yunchuzh: How should a robot perceive the world? What kind of visual representation leads to robust visuomotor policy learning for robo….

0

26

0

Abhishek Gupta

@abhishekunique7

15 days

Check out @yunchuzh's new work on automatically selecting keypoints as a representation for super robust policy learning!.

Yunchu

@yunchuzh

16 days

How should a robot perceive the world? What kind of visual representation leads to robust visuomotor policy learning for robotics?. Policies trained on raw images are often fragile—easily broken by lighting, clutter, or object variations—making it challenging to deploy policies

1

0

32

Abhishek Gupta

@abhishekunique7

15 days

Go read our paper to get into the guts of things :). Paper: Website: Here’s some higher level points, I learned from this paper. 1. Keypoints are surprisingly robust. Future policy representations may perhaps look a little.

0

2

Abhishek Gupta

@abhishekunique7

15 days

Here’s a cool bonus - turns out keypoint trackers are pretty good at bridging large visual gaps. So the cool thing is that visuomotor policies trained in simulation, can transfer over to the real world; even when the simulation looks pretty bad! This is impactful because it

1

3

Abhishek Gupta

@abhishekunique7

15 days

But we care about policies! Turns out, using the same imitation learning setup as standard behavior cloning, but just changing the input visual representation to the keypoints selected by ATK yields policies that are robust to significant visual variations—lighting, background,

1

0

1