Théo Vincent @RLC @Theo_Vincent_ X Profile

Théo Vincent @RLC

@Theo_Vincent_

Followers

309

Following

604

Media

52

Statuses

148

PhD student at @DFKI & @ias_tudarmstadt, working on RL 🤖 Previously master student at MVA @ENS_ParisSaclay & ENPC 🎓

Darmstadt, Allemagne

Joined February 2024

Don't wanna be here? Send us removal request.

Théo Vincent @RLC

@Theo_Vincent_

1 month

A big limitation of pruning methods is that you must choose the number of parameters to remove before training starts. But how can you know how much to prune?🤷. @RL_Conference, I will present Eau De Q-Network, the first RL method designed to DISCOVER the final sparsity level🔎

1

5

28

Théo Vincent @RLC

@Theo_Vincent_

11 days

More details here:

Théo Vincent @RLC

@Theo_Vincent_

16 days

To increase the reward propagation in value-based RL algorithms, it is tempting to reduce the target update period🤔 . But, this makes the training unstable💔. @RL_Conference, I will present i-QN, a new method that allows faster reward propagation, while keeping stability⚡️. 👉🧵

0

Théo Vincent @RLC

@Theo_Vincent_

11 days

Today, I will be presenting iterated Q-Network @RL_Conference, feel free to come by at poster #10!. My talk will be between 11:45 AM and 12:30 PM in CCIS 1-430, Track 1: RL algorithms, Deep RL.

2

33

Théo Vincent @RLC

@Theo_Vincent_

11 days

RT @pcastr: Great #runconference @RL_Conference today (even with a little rain), join for the last one tomorrow morning, 6:30am, meet at G….

0

2

0

Théo Vincent @RLC

@Theo_Vincent_

12 days

RT @Ayushj240: Honored that our @RL_Conference paper won the Outstanding Paper Award on Empirical Reinforcement Learning Research!. 📜Mitiga….

0

9

0

Théo Vincent @RLC

@Theo_Vincent_

12 days

RT @pcastr: Very honoured that our paper was granted an outstanding paper award for scientific understanding in RL during the @RL_Conferenc….

0

7

0

Théo Vincent @RLC

@Theo_Vincent_

12 days

RT @GlenBerseth: @RL_Conference will be in Montréal next year at @UMontreal. We are looking forward to welcoming you all! Bienvenue! https:….

0

28

0

Théo Vincent @RLC

@Theo_Vincent_

14 days

Details are over here:

Théo Vincent @RLC

@Theo_Vincent_

15 days

Should we use a target network in deep value-based RL?🤔. The answer has always been YES or NO, as there are pros and cons. @RLFrameWorkshop, I will present iS-QN, a method that lies in between this binary view, collecting the pros while reducing the cons🚀

0

Théo Vincent @RLC

@Theo_Vincent_

14 days

I will be presenting iS-QN at the poster session of @RLFrameWorkshop today, feel free to come and chat at poster 39!

1

0

22

Théo Vincent @RLC

@Theo_Vincent_

15 days

RT @MarlosCMachado: Here's what our group will be presenting at RLC'25. * Invited Talks at Workshops:* .Tue 10:00: The Causal RL Workshop….

0

1

0

Théo Vincent @RLC

@Theo_Vincent_

15 days

Looking forward to @RL_Conference! I will be presenting 4 posters, feel free to come and exchange with me during the conference, @RLFrameWorkshop, or @ibrlworkshop🙂

0

1

33

Théo Vincent @RLC

@Theo_Vincent_

15 days

RT @johanobandoc: 🧩 Curious about the foundations of RL?.Join us at the Finding the Frame Workshop @RL_Conference!. A full day of talks, pa….

0

6

0

Théo Vincent @RLC

@Theo_Vincent_

15 days

@RLFrameWorkshop 9/9 .Many thanks to my co-authors: @YogeshTrip7354, Tim Faust, Yaniv Oren, @Jan_R_Peters, and @CarloDeramo. and to the funding agencies: @ias_tudarmstadt, @TUDarmstadt, @DFKI, @Hessian_AI, @infsys_uniwue.

0

5

Théo Vincent @RLC

@Theo_Vincent_

15 days

@RLFrameWorkshop 8/9.Does it work on other settings?.YES, we also report results:.- with the IMPALA architecture🦓.- on offline experiments✈️.- on continuous control experiments with the Simba architecture (only on the poster)🤖. 📄👉

1

0

6

Théo Vincent @RLC

@Theo_Vincent_

15 days

@RLFrameWorkshop 7/9.By enforcing the network to learn multiple Bellman backups in parallel, iS-DQN K>1 constructs richer features💪

1

0

4

Théo Vincent @RLC

@Theo_Vincent_

15 days

@RLFrameWorkshop 6/9.By adding additional heads to learn the following Bellman backups (iS-DQN K>1), iS-QN improves performance while not significantly increasing the memory footprint🚀. Note: we added a layer normalization to further increase stability.

1

0

4

Théo Vincent @RLC

@Theo_Vincent_

15 days

@RLFrameWorkshop 5/9.Interestingly, the idea of sharing the last features (iS-DQN K=1) already reduces the performance gap between target-free DQN (TF-DQN) and target-based DQN (TB-DQN) on 15 Atari games by a large margin.

1

0

4

Théo Vincent @RLC

@Theo_Vincent_

15 days

@RLFrameWorkshop 4/9.Then, we can utilize the target-based literature to enhance training stability. We enrich the classical TD loss with iterated Q-learning to increase the feedback on the shared layers by learning consecutive Bellman backups. This leads to iterated Shared Q-Network (iS-QN)

1

0

5

Théo Vincent @RLC

@Theo_Vincent_

15 days

@RLFrameWorkshop 3/9.Our main idea is to use the last linear layer of the online network as a target network and share the rest of the features with the online network. This drastically reduces the memory footprint because only the last linear layer of the online network is stored as a copy.

1

0

6

Théo Vincent @RLC

@Theo_Vincent_

15 days

@RLFrameWorkshop 2/9.Many recent works have shown that removing the target network leads to a performance decrease📉. Even methods that have been initially introduced without a target network benefit from their reintegration📈

1

0

4

Théo Vincent @RLC

@Theo_Vincent_

15 days

@RLFrameWorkshop 1/9.With function approximation, bootstrapping without using a target network often leads to training instabilities. However, using a target network slows down reward propagation and doubles the memory footprint dedicated to Q-networks.

1

0

5