Joe Watson @JoeMWatson X Profile

Joe Watson

@JoeMWatson

Followers

702

Following

2K

Media

31

Statuses

96

phd researcher in robotics & machine learning for control @DFKI @ias_tudarmstadt @TUDarmstadt previously @DeepMind intern, @CMRSurgical, @Cambridge_Eng

Darmstadt, Germany

Joined January 2012

Don't wanna be here? Send us removal request.

Joe Watson

@JoeMWatson

22 days

RT @kay_pompetzki: Could geometric cues help improve goal inference in robotics?. We explore this question at #RLDM today | Spot 86. Stop….

0

9

0

Joe Watson

@JoeMWatson

1 month

Scaling sample-efficient RL often relies on artisanal architectures (extra LayerNorm, etc). Daniel and Florian found a major issue with vanilla MLPs: the larger network weights slow optimization, so simply adding weight norm unlocks sample efficiency for much harder tasks! 🚀.

Daniel Palenicek

@DPalenicek

1 month

🚀 New preprint "Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization"🤖. We propose CrossQ+WN, a simple yet powerful off-policy RL for more sample-efficiency and scalability to higher update-to-data ratios. 🧵 #RL @ias_tudarmstadt.

0

1

11

Joe Watson

@JoeMWatson

11 months

This was a big collaboration, with the help of @theo_grune82772, @an_thai_le, @kay_hansel, @AHendawy19, @OlegArenz, @CarloDeramo , @MilesCranmer and @Jan_R_Peters, and Chen, Oliver, Will, Fabian, Tanmay and Martin who aren't on twitter.

0

3

11

Joe Watson

@JoeMWatson

11 months

We recently collaborated with @ABBgroupnews to survey the recent literature on physics-based machine learning ⚡️, with a focus on knowledge- and data-driven inductive biases 🔭. Check out the survey here 👇 and please reach out if we missed anything.

1

7

40

Joe Watson

@JoeMWatson

11 months

Active learning! 🔥Entropy-regularized control! ⚡️Sequential Monte Carlo! 🎲 Make sure to check it out.

Hany Abdulsamad

@HanyAbdulsamad

1 year

#ICML2024.@sahel_iqbal is presenting our paper "Nesting Particle Filters for Experimental Design in Dynamical Systems" next week. A novel way to amortize sequential Bayesian experimental design:. Jointly with @AdrienCorenflos and @simosarkka.@FCAI_fi . 1/*.

4

1

10

Joe Watson

@JoeMWatson

2 years

I’m presenting CSIL at the poster session this morning, come find me at #1906! #NeurIPS2023 . Sandy and I also open-sourced the implementation a few weeks back, you can find it at.

Joe Watson

@JoeMWatson

2 years

Excited to share my internship project from my time at @DeepMind, looking at sample-efficient imitation learning using entropy-regularized reinforcement learning. TL;DR: do behavioral cloning (BC), get inverse reinforcement learning (IRL) for free! [1/6]

0

9

42

Joe Watson

@JoeMWatson

2 years

CSIL has been accepted at @NeurIPSConf as a spotlight! ✨. Big thanks to my internship hosts Sandy and Nicolas at @GoogleDeepMind Robotics. We hope to share the code in the near future 🤖.

Joe Watson

@JoeMWatson

2 years

Excited to share my internship project from my time at @DeepMind, looking at sample-efficient imitation learning using entropy-regularized reinforcement learning. TL;DR: do behavioral cloning (BC), get inverse reinforcement learning (IRL) for free! [1/6]

2

3

45

Joe Watson

@JoeMWatson

2 years

*the project page

0

2

Joe Watson

@JoeMWatson

2 years

This project was a collaboration with my internship hosts, Sandy Huang and Nicolas Heess!. For more information and results, check out.the project page the paper

1

18

Joe Watson

@JoeMWatson

2 years

Combining BC with the shaped reward results in a sample-efficient imitation algorithm, which we call coherent soft imitation learning (CSIL). As we are just doing BC + soft RL, it scales to deep online, offline and image-based tasks gracefully [5/6]

1

6

Joe Watson

@JoeMWatson

2 years

Using reward shaping theory, we can show that this ‘log policy ratio’ is a valid shaped reward function, which means that we can use it to further improve the BC policy without unlearning!. We call this property between the policy and reward ‘coherency’ [4/6]

1

0

9

Joe Watson

@JoeMWatson

2 years

Instead of the game-theoretic approach, can we just ‘invert’ the policy update instead?. This would give us a reward function for the BC policy. In entropy-regularized (‘soft’) RL, this is easy, as the policy update has a closed-form expression [3/6]

1

0

8

Joe Watson

@JoeMWatson

2 years

Our method was motivated by two observations:. 1. Game-theoretic approaches (e.g. GAIL, IQLearn) are difficult to use as tasks get more complex. 2. IRL methods don't really benefit from using the BC policy, as the random initial reward will lead to ‘unlearning’ the initial policy

1

6

Joe Watson

@JoeMWatson

2 years

Excited to share my internship project from my time at @DeepMind, looking at sample-efficient imitation learning using entropy-regularized reinforcement learning. TL;DR: do behavioral cloning (BC), get inverse reinforcement learning (IRL) for free! [1/6]

2

21

252

Joe Watson

@JoeMWatson

2 years

RT @Jan_R_Peters: We are searching for excellent ML+Robotics PhD students for five different projects at @ias_tudarmstadt, @CS_TUDarmstadt,….

0

61

0

Joe Watson

@JoeMWatson

3 years

The oral presentation and poster is on Saturday! . Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes (w/ @Jan_R_Peters). 📄Paper: 💻Code: 🤖Project website:

0

2

9

Joe Watson

@JoeMWatson

3 years

We use a concentration inequality to derive the objective, which introduces a divergence-based regularization term we can estimate using the effective sample size, and controlled by probability δ [5/6]

1

0

3

Joe Watson

@JoeMWatson

3 years

For Monte Carlo optimization, we propose ‘lower-bound policy search’, which iteratively optimizes the temperature to balance optimization against approximate inference quality [4/6]

1

0

1

Joe Watson

@JoeMWatson

3 years

For the prior, we motivate the use of Gaussian processes as a smooth prior over open-loop action sequences. We show how these continuous-time priors enforce both smooth solutions and exploratory samples [3/6]

1

0

1

Joe Watson

@JoeMWatson

3 years

Working in the episodic open-loop setting, we connect the solutions of prior work such as REPS and MPPI through the lens of pseudo-posteriors, which we can characterize through their prior and the likelihood temperature [2/6]

1

0

1