JoeMWatson Profile Banner
Joe Watson Profile
Joe Watson

@JoeMWatson

Followers
702
Following
2K
Media
31
Statuses
96

phd researcher in robotics & machine learning for control @DFKI @ias_tudarmstadt @TUDarmstadt previously @DeepMind intern, @CMRSurgical, @Cambridge_Eng

Darmstadt, Germany
Joined January 2012
Don't wanna be here? Send us removal request.
@JoeMWatson
Joe Watson
22 days
RT @kay_pompetzki: Could geometric cues help improve goal inference in robotics?. We explore this question at #RLDM today | Spot 86. Stop….
0
9
0
@JoeMWatson
Joe Watson
1 month
Scaling sample-efficient RL often relies on artisanal architectures (extra LayerNorm, etc). Daniel and Florian found a major issue with vanilla MLPs: the larger network weights slow optimization, so simply adding weight norm unlocks sample efficiency for much harder tasks! 🚀.
@DPalenicek
Daniel Palenicek
1 month
🚀 New preprint "Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization"🤖. We propose CrossQ+WN, a simple yet powerful off-policy RL for more sample-efficiency and scalability to higher update-to-data ratios. 🧵 #RL @ias_tudarmstadt.
0
1
11
@JoeMWatson
Joe Watson
11 months
This was a big collaboration, with the help of @theo_grune82772, @an_thai_le, @kay_hansel, @AHendawy19, @OlegArenz, @CarloDeramo , @MilesCranmer and @Jan_R_Peters, and Chen, Oliver, Will, Fabian, Tanmay and Martin who aren't on twitter.
0
3
11
@JoeMWatson
Joe Watson
11 months
We recently collaborated with @ABBgroupnews to survey the recent literature on physics-based machine learning ⚡️, with a focus on knowledge- and data-driven inductive biases 🔭. Check out the survey here 👇 and please reach out if we missed anything.
Tweet media one
1
7
40
@JoeMWatson
Joe Watson
11 months
Active learning! 🔥Entropy-regularized control! ⚡️Sequential Monte Carlo! 🎲 Make sure to check it out.
@HanyAbdulsamad
Hany Abdulsamad
1 year
#ICML2024.@sahel_iqbal is presenting our paper "Nesting Particle Filters for Experimental Design in Dynamical Systems" next week. A novel way to amortize sequential Bayesian experimental design:. Jointly with @AdrienCorenflos and @simosarkka.@FCAI_fi . 1/*.
4
1
10
@JoeMWatson
Joe Watson
2 years
I’m presenting CSIL at the poster session this morning, come find me at #1906! #NeurIPS2023 . Sandy and I also open-sourced the implementation a few weeks back, you can find it at.
@JoeMWatson
Joe Watson
2 years
Excited to share my internship project from my time at @DeepMind, looking at sample-efficient imitation learning using entropy-regularized reinforcement learning. TL;DR: do behavioral cloning (BC), get inverse reinforcement learning (IRL) for free! [1/6]
0
9
42
@JoeMWatson
Joe Watson
2 years
CSIL has been accepted at @NeurIPSConf as a spotlight! ✨. Big thanks to my internship hosts Sandy and Nicolas at @GoogleDeepMind Robotics. We hope to share the code in the near future 🤖.
@JoeMWatson
Joe Watson
2 years
Excited to share my internship project from my time at @DeepMind, looking at sample-efficient imitation learning using entropy-regularized reinforcement learning. TL;DR: do behavioral cloning (BC), get inverse reinforcement learning (IRL) for free! [1/6]
2
3
45
@JoeMWatson
Joe Watson
2 years
*the project page
0
0
2
@JoeMWatson
Joe Watson
2 years
This project was a collaboration with my internship hosts, Sandy Huang and Nicolas Heess!. For more information and results, check out.the project page the paper
1
1
18
@JoeMWatson
Joe Watson
2 years
Combining BC with the shaped reward results in a sample-efficient imitation algorithm, which we call coherent soft imitation learning (CSIL). As we are just doing BC + soft RL, it scales to deep online, offline and image-based tasks gracefully [5/6]
1
1
6
@JoeMWatson
Joe Watson
2 years
Using reward shaping theory, we can show that this ‘log policy ratio’ is a valid shaped reward function, which means that we can use it to further improve the BC policy without unlearning!. We call this property between the policy and reward ‘coherency’ [4/6]
Tweet media one
1
0
9
@JoeMWatson
Joe Watson
2 years
Instead of the game-theoretic approach, can we just ‘invert’ the policy update instead?. This would give us a reward function for the BC policy. In entropy-regularized (‘soft’) RL, this is easy, as the policy update has a closed-form expression [3/6]
Tweet media one
1
0
8
@JoeMWatson
Joe Watson
2 years
Our method was motivated by two observations:. 1. Game-theoretic approaches (e.g. GAIL, IQLearn) are difficult to use as tasks get more complex. 2. IRL methods don't really benefit from using the BC policy, as the random initial reward will lead to ‘unlearning’ the initial policy
1
1
6
@JoeMWatson
Joe Watson
2 years
Excited to share my internship project from my time at @DeepMind, looking at sample-efficient imitation learning using entropy-regularized reinforcement learning. TL;DR: do behavioral cloning (BC), get inverse reinforcement learning (IRL) for free! [1/6]
2
21
252
@JoeMWatson
Joe Watson
2 years
RT @Jan_R_Peters: We are searching for excellent ML+Robotics PhD students for five different projects at @ias_tudarmstadt, @CS_TUDarmstadt,….
0
61
0
@JoeMWatson
Joe Watson
3 years
The oral presentation and poster is on Saturday! . Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes (w/ @Jan_R_Peters). 📄Paper: 💻Code: 🤖Project website:
0
2
9
@JoeMWatson
Joe Watson
3 years
We use a concentration inequality to derive the objective, which introduces a divergence-based regularization term we can estimate using the effective sample size, and controlled by probability δ [5/6]
Tweet media one
1
0
3
@JoeMWatson
Joe Watson
3 years
For Monte Carlo optimization, we propose ‘lower-bound policy search’, which iteratively optimizes the temperature to balance optimization against approximate inference quality [4/6]
1
0
1
@JoeMWatson
Joe Watson
3 years
For the prior, we motivate the use of Gaussian processes as a smooth prior over open-loop action sequences. We show how these continuous-time priors enforce both smooth solutions and exploratory samples [3/6]
1
0
1
@JoeMWatson
Joe Watson
3 years
Working in the episodic open-loop setting, we connect the solutions of prior work such as REPS and MPPI through the lens of pseudo-posteriors, which we can characterize through their prior and the likelihood temperature [2/6]
Tweet media one
1
0
1