
Pablo Samuel Castro
@pcastr
Followers
12K
Following
16K
Media
2K
Statuses
9K
Señor swesearcher @ Google DeepMind. Adjunct prof @ U de Montreal & Mila. Musician. From 🇪🇨 living in 🇨🇦.
Ottawa/Montreal, QC
Joined December 2009
RT @MichaelD1729: Glad to see more work getting RL to maintain plasticity in non-stationary PCG levels! It's been a folk theory for a while….
0
2
0
Plasticity loss makes RL hard in continual settings, & it turns out that churn is a salient cause. C-Chain explicitly aims to reduce churn via regularization which aids in maintaining plasticity. See @tanghyyy 's thread on our #icml2025 paper, and come chat with us in Vancouver!.
(1/8)🔥Excited to share that our paper “Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn” has been accepted to #ICML2025!🎉. RL agents struggle to adapt in continual learning. Why? We trace the problem to something subtle: churn. 👇🧵@Mila_Quebec
1
1
14
B/f joining Brain in 2017 I interviewed with a startup doing smthg similar (w/ lots more manual effort). In intrvw I noticed a sleep pod & said "we have nap pods at Google also". They said "yeah, we regularly spend the full night at office". I got the offer, but said no 😊.
We all know vending machines are automated, but what if we allowed an AI to run the entire business: setting prices, ordering inventory, responding to customer requests, and so on?. In collaboration with @andonlabs, we did just that. Read the post:
0
0
14
proud to share a survey of state representation learning in RL that my student ayoub echchahed and i prepared, that was just published on @TmlrPub !.this was the bulk of ayoub's masters thesis and he put a lot of work and care into it!.a few details in thread below. 1/
New #SurveyCertification:. A Survey of State Representation Learning for Deep Reinforcement Learning. Ayoub Echchahed, Pablo Samuel Castro. #reinforcement #representations #representation.
3
15
121
RT @johanobandoc: 🚨 Excite to share "Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning" work. 🥳. We tackle gradi….
0
17
0
really excited about this new work we just put out, led by my students @creus_roger & @johanobandoc , where we examine the challenges of gradient propagation when scaling deep RL networks. roger & johan put in a lot of work and care in this work, check out more details in 🧵👇🏾 !.
🚨 Excited to share our new work: "Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning"! 📈. We propose gradient interventions that enable stable, scalable learning, achieving significant performance gains across agents and environments!. Details below 👇
0
10
85
thrilled that we'll be presenting this paper as a spotlight at #ICML2025 . come by our poster in vancouver to chat with us about the use of LLMs for advancing neuroscience!. here's the camera-ready version:.
Can LLMs be used to discover interpretable models of human and animal behavior?🤔. Turns out: yes!. Thrilled to share our latest preprint where we used FunSearch to automatically discover symbolic cognitive models of behavior. 1/12
5
16
76
I will continue to proudly say I never use LLMs. Not for writing, not for coding, and certainly not for "art". Just haven't felt the need at all.
It’s a hefty 206-page research paper, and the findings are concerning. "LLM users consistently underperformed at neural, linguistic, and behavioral levels". This study finds LLM dependence weakens the writer’s own neural and linguistic fingerprints. 🤔🤔. Relying only on EEG,
0
4
47
Super proud of both @WalterMayor_T & @johanobandoc . This is Walter's 1st major publication as 1st author, & Johan acted as a mentor to him. I did something similar with Johan 4 years ago: . Hoping to continue spreading this type of mentorship!.12/
Happy to share our #ICML2021 "Revisiting Rainbow" paper, w/ @JS_Obando !.We argue for small- to mid-scale envs in deep RL for increasing scientific insight & inclusivity. 📜Paper: ✍️🏾Blog: 🐍Code: Thread 1/🧵
1
1
16
PQN, a recently introduced value-based method ( has a similar data-collection as PPO. Although we see a similar trend as with PPO, but much less pronounced. It is possible our findings are more correlated with policy-based methods. 9/
🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX!. What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on
1
0
2
We also observe substantial gains in performance when using decoupled architectures, which can often lead to sub-par performance. This insight opens the door for more exploration in asymmetric architectures, for example, see .7/
📢optimistic critics can empower small actors📢. new @RL_Conference paper led by my students olya and dhruv!.we study actor-critic agents with asymmetric setups and find that, while smaller actors have advantages, they can degrade performance and result in overfit critics. 1/
1
0
3