
Théo Vincent @RLC
@Theo_Vincent_
Followers
309
Following
604
Media
52
Statuses
148
PhD student at @DFKI & @ias_tudarmstadt, working on RL 🤖 Previously master student at MVA @ENS_ParisSaclay & ENPC 🎓
Darmstadt, Allemagne
Joined February 2024
A big limitation of pruning methods is that you must choose the number of parameters to remove before training starts. But how can you know how much to prune?🤷. @RL_Conference, I will present Eau De Q-Network, the first RL method designed to DISCOVER the final sparsity level🔎
1
5
28
More details here:
To increase the reward propagation in value-based RL algorithms, it is tempting to reduce the target update period🤔 . But, this makes the training unstable💔. @RL_Conference, I will present i-QN, a new method that allows faster reward propagation, while keeping stability⚡️. 👉🧵
0
0
0
Today, I will be presenting iterated Q-Network @RL_Conference, feel free to come by at poster #10!. My talk will be between 11:45 AM and 12:30 PM in CCIS 1-430, Track 1: RL algorithms, Deep RL.
2
2
33
RT @pcastr: Great #runconference @RL_Conference today (even with a little rain), join for the last one tomorrow morning, 6:30am, meet at G….
0
2
0
RT @Ayushj240: Honored that our @RL_Conference paper won the Outstanding Paper Award on Empirical Reinforcement Learning Research!. 📜Mitiga….
0
9
0
RT @pcastr: Very honoured that our paper was granted an outstanding paper award for scientific understanding in RL during the @RL_Conferenc….
0
7
0
RT @GlenBerseth: @RL_Conference will be in Montréal next year at @UMontreal. We are looking forward to welcoming you all! Bienvenue! https:….
0
28
0
Details are over here:
Should we use a target network in deep value-based RL?🤔. The answer has always been YES or NO, as there are pros and cons. @RLFrameWorkshop, I will present iS-QN, a method that lies in between this binary view, collecting the pros while reducing the cons🚀
0
0
0
I will be presenting iS-QN at the poster session of @RLFrameWorkshop today, feel free to come and chat at poster 39!
1
0
22
RT @MarlosCMachado: Here's what our group will be presenting at RLC'25. * Invited Talks at Workshops:* .Tue 10:00: The Causal RL Workshop….
0
1
0
Looking forward to @RL_Conference! I will be presenting 4 posters, feel free to come and exchange with me during the conference, @RLFrameWorkshop, or @ibrlworkshop🙂
0
1
33
RT @johanobandoc: 🧩 Curious about the foundations of RL?.Join us at the Finding the Frame Workshop @RL_Conference!. A full day of talks, pa….
0
6
0
@RLFrameWorkshop 9/9 .Many thanks to my co-authors: @YogeshTrip7354, Tim Faust, Yaniv Oren, @Jan_R_Peters, and @CarloDeramo. and to the funding agencies: @ias_tudarmstadt, @TUDarmstadt, @DFKI, @Hessian_AI, @infsys_uniwue.
0
0
5
@RLFrameWorkshop 8/9.Does it work on other settings?.YES, we also report results:.- with the IMPALA architecture🦓.- on offline experiments✈️.- on continuous control experiments with the Simba architecture (only on the poster)🤖. 📄👉
1
0
6
@RLFrameWorkshop 7/9.By enforcing the network to learn multiple Bellman backups in parallel, iS-DQN K>1 constructs richer features💪
1
0
4
@RLFrameWorkshop 6/9.By adding additional heads to learn the following Bellman backups (iS-DQN K>1), iS-QN improves performance while not significantly increasing the memory footprint🚀. Note: we added a layer normalization to further increase stability.
1
0
4
@RLFrameWorkshop 5/9.Interestingly, the idea of sharing the last features (iS-DQN K=1) already reduces the performance gap between target-free DQN (TF-DQN) and target-based DQN (TB-DQN) on 15 Atari games by a large margin.
1
0
4
@RLFrameWorkshop 4/9.Then, we can utilize the target-based literature to enhance training stability. We enrich the classical TD loss with iterated Q-learning to increase the feedback on the shared layers by learning consecutive Bellman backups. This leads to iterated Shared Q-Network (iS-QN)
1
0
5
@RLFrameWorkshop 3/9.Our main idea is to use the last linear layer of the online network as a target network and share the rest of the features with the online network. This drastically reduces the memory footprint because only the last linear layer of the online network is stored as a copy.
1
0
6
@RLFrameWorkshop 2/9.Many recent works have shown that removing the target network leads to a performance decrease📉. Even methods that have been initially introduced without a target network benefit from their reintegration📈
1
0
4
@RLFrameWorkshop 1/9.With function approximation, bootstrapping without using a target network often leads to training instabilities. However, using a target network slows down reward propagation and doubles the memory footprint dedicated to Q-networks.
1
0
5