creus_roger Profile Banner
Roger Creus Castanyer Profile
Roger Creus Castanyer

@creus_roger

Followers
603
Following
913
Media
43
Statuses
234

Maximizing the unexpected return. PhD student @Mila_Quebec | Prev: @UbisoftLaForge @la_UPC @HP

Montréal, Québec
Joined April 2021
Don't wanna be here? Send us removal request.
@creus_roger
Roger Creus Castanyer
11 days
🚨 Excited to share our new work: "Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning"! 📈. We propose gradient interventions that enable stable, scalable learning, achieving significant performance gains across agents and environments!. Details below 👇
Tweet media one
2
31
149
@creus_roger
Roger Creus Castanyer
44 minutes
RT @tanghyyy: (1/8)🔥Excited to share that our paper “Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn” has….
0
10
0
@creus_roger
Roger Creus Castanyer
7 days
RT @MartinKlissarov: As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing.….
0
63
0
@creus_roger
Roger Creus Castanyer
7 days
RT @akhil_bagaria: New paper: skill discovery is a hallmark of intelligence--identify interesting questions about the world, and learn how….
0
16
0
@creus_roger
Roger Creus Castanyer
10 days
RT @pcastr: proud to share a survey of state representation learning in RL that my student ayoub echchahed and i prepared, that was just pu….
0
15
0
@creus_roger
Roger Creus Castanyer
11 days
RT @GlenBerseth: Being unable to scale #DeepRL to solve diverse, complex tasks with large distribution changes has been holding back the #R….
0
9
0
@creus_roger
Roger Creus Castanyer
11 days
RT @pcastr: really excited about this new work we just put out, led by my students @creus_roger & @johanobandoc , where we examine the chal….
0
10
0
@creus_roger
Roger Creus Castanyer
11 days
This work was a fantastic collaboration with @johanobandoc,@luli_airl, @pierrelux, @GlenBerseth, @AaronCourville, @pcastr 🙌. 📄 Paper: 💻 Code:
0
0
4
@creus_roger
Roger Creus Castanyer
11 days
We also combine our proposed interventions with recent scaling strategies like Simba , and still see additional improvements in performance and stability 📊
Tweet media one
0
0
4
@creus_roger
Roger Creus Castanyer
11 days
🔁 Back to our artificial non-stationary supervised setting:. With both interventions, networks now show near-perfect plasticity. They can continually fit changing data across network scales. No collapse 💪
Tweet media one
1
1
7
@creus_roger
Roger Creus Castanyer
11 days
Combining both ideas, we see massive gains on the ALE (Atari)! 🎮🔥. We benchmark against PQN and PPO, and the improvements are remarkable, both in terms of performance and scalability.
Tweet media one
Tweet media two
1
0
5
@creus_roger
Roger Creus Castanyer
11 days
💡 Our second intervention: better optimizers. Second-order estimators capture loss curvature, providing more stable updates than first-order methods. Our ablations show the Kron optimizer shines in deep RL, helping agents adapt as learning evolves ✨
Tweet media one
2
0
6
@creus_roger
Roger Creus Castanyer
11 days
💡 Our first intervention: better architectures for stable gradients. Residual connections create shortcuts that preserve gradients and avoid vanishing ⚡️. We extend this with MultiSkip: broadcasting features to all fully connected layers, ensuring direct gradient flow at scale
Tweet media one
1
0
5
@creus_roger
Roger Creus Castanyer
11 days
So what's going on?.Gradients are at the heart of this instability. We relate vanishing gradients to classic RL diagnostics:.⚫ Dead neurons.⚫ Representation collapse.⚫ Loss landscape instability (from bootstrapping)
Tweet media one
1
0
4
@creus_roger
Roger Creus Castanyer
11 days
We test the same architectures in supervised learning: gradients propagate well and performance scales ✅. But in RL, performance collapses due to non-stationarity!. We simulate this by shuffling labels periodically. The result? Gradients degrade, and bigger networks collapse! 🚫
Tweet media one
1
0
5
@creus_roger
Roger Creus Castanyer
11 days
Scaling up networks in deep RL is tricky: do it naively, and performance collapses 😵‍💫. Why? Increasing depth and width leads to severe vanishing gradients, causing unstable learning ⚠️. We diagnose this issue across several algorithms: DQN, Rainbow, PQN, PPO, SAC, and DDPG.
Tweet media one
1
0
4
@creus_roger
Roger Creus Castanyer
13 days
RT @harshit_sikchi: Behavioral Foundation Models (BFMs) trained with RL are secretly more powerful than we think. BFM’s directly output a p….
0
44
0
@creus_roger
Roger Creus Castanyer
14 days
RT @pcastr: thrilled that we'll be presenting this paper as a spotlight at #ICML2025 . come by our poster in vancouver to chat with us abou….
0
16
0
@creus_roger
Roger Creus Castanyer
16 days
RT @wang_jianren: (1/n) Since its publication in 2017, PPO has essentially become synonymous with RL. Today, we are excited to provide you….
0
80
0
@creus_roger
Roger Creus Castanyer
17 days
RT @gan_chuang: 🤖Can world models quickly adapt to new environments with just a few interactions? . Introducing AdaWorld 🌍 — a new approac….
0
36
0