Roger Creus Castanyer @creus_roger X Profile

Roger Creus Castanyer

@creus_roger

Followers

603

Following

913

Media

43

Statuses

234

Maximizing the unexpected return. PhD student @Mila_Quebec | Prev: @UbisoftLaForge @la_UPC @HP

Montréal, Québec

Joined April 2021

Don't wanna be here? Send us removal request.

Roger Creus Castanyer

@creus_roger

11 days

🚨 Excited to share our new work: "Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning"! 📈. We propose gradient interventions that enable stable, scalable learning, achieving significant performance gains across agents and environments!. Details below 👇

2

31

149

Roger Creus Castanyer

@creus_roger

44 minutes

RT @tanghyyy: (1/8)🔥Excited to share that our paper “Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn” has….

0

10

0

Roger Creus Castanyer

@creus_roger

7 days

RT @MartinKlissarov: As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing.….

0

63

0

Roger Creus Castanyer

@creus_roger

7 days

RT @akhil_bagaria: New paper: skill discovery is a hallmark of intelligence--identify interesting questions about the world, and learn how….

0

16

0

Roger Creus Castanyer

@creus_roger

10 days

RT @pcastr: proud to share a survey of state representation learning in RL that my student ayoub echchahed and i prepared, that was just pu….

0

15

0

Roger Creus Castanyer

@creus_roger

11 days

RT @GlenBerseth: Being unable to scale #DeepRL to solve diverse, complex tasks with large distribution changes has been holding back the #R….

0

9

0

Roger Creus Castanyer

@creus_roger

11 days

RT @pcastr: really excited about this new work we just put out, led by my students @creus_roger & @johanobandoc , where we examine the chal….

0

10

0

Roger Creus Castanyer

@creus_roger

11 days

This work was a fantastic collaboration with @johanobandoc,@luli_airl, @pierrelux, @GlenBerseth, @AaronCourville, @pcastr 🙌. 📄 Paper: 💻 Code:

0

4

Roger Creus Castanyer

@creus_roger

11 days

We also combine our proposed interventions with recent scaling strategies like Simba , and still see additional improvements in performance and stability 📊

0

4

Roger Creus Castanyer

@creus_roger

11 days

🔁 Back to our artificial non-stationary supervised setting:. With both interventions, networks now show near-perfect plasticity. They can continually fit changing data across network scales. No collapse 💪

1

7

Roger Creus Castanyer

@creus_roger

11 days

Combining both ideas, we see massive gains on the ALE (Atari)! 🎮🔥. We benchmark against PQN and PPO, and the improvements are remarkable, both in terms of performance and scalability.

1

0

5

Roger Creus Castanyer

@creus_roger

11 days

💡 Our second intervention: better optimizers. Second-order estimators capture loss curvature, providing more stable updates than first-order methods. Our ablations show the Kron optimizer shines in deep RL, helping agents adapt as learning evolves ✨

2

0

6

Roger Creus Castanyer

@creus_roger

11 days

💡 Our first intervention: better architectures for stable gradients. Residual connections create shortcuts that preserve gradients and avoid vanishing ⚡️. We extend this with MultiSkip: broadcasting features to all fully connected layers, ensuring direct gradient flow at scale

1

0

5

Roger Creus Castanyer

@creus_roger

11 days

So what's going on?.Gradients are at the heart of this instability. We relate vanishing gradients to classic RL diagnostics:.⚫ Dead neurons.⚫ Representation collapse.⚫ Loss landscape instability (from bootstrapping)

1

0

4

Roger Creus Castanyer

@creus_roger

11 days

We test the same architectures in supervised learning: gradients propagate well and performance scales ✅. But in RL, performance collapses due to non-stationarity!. We simulate this by shuffling labels periodically. The result? Gradients degrade, and bigger networks collapse! 🚫

1

0

5

Roger Creus Castanyer

@creus_roger

11 days

Scaling up networks in deep RL is tricky: do it naively, and performance collapses 😵‍💫. Why? Increasing depth and width leads to severe vanishing gradients, causing unstable learning ⚠️. We diagnose this issue across several algorithms: DQN, Rainbow, PQN, PPO, SAC, and DDPG.

1

0

4

Roger Creus Castanyer

@creus_roger

13 days

RT @harshit_sikchi: Behavioral Foundation Models (BFMs) trained with RL are secretly more powerful than we think. BFM’s directly output a p….

0

44

0

Roger Creus Castanyer

@creus_roger

14 days

RT @pcastr: thrilled that we'll be presenting this paper as a spotlight at #ICML2025 . come by our poster in vancouver to chat with us abou….

0

16

0

Roger Creus Castanyer

@creus_roger

16 days

RT @wang_jianren: (1/n) Since its publication in 2017, PPO has essentially become synonymous with RL. Today, we are excited to provide you….

0

80

0

Roger Creus Castanyer

@creus_roger

17 days

RT @gan_chuang: 🤖Can world models quickly adapt to new environments with just a few interactions? . Introducing AdaWorld 🌍 — a new approac….

0

36

0