pcastr Profile Banner
Pablo Samuel Castro Profile
Pablo Samuel Castro

@pcastr

Followers
12K
Following
16K
Media
2K
Statuses
9K

Señor swesearcher @ Google DeepMind. Adjunct prof @ U de Montreal & Mila. Musician. From 🇪🇨 living in 🇨🇦.

Ottawa/Montreal, QC
Joined December 2009
Don't wanna be here? Send us removal request.
@pcastr
Pablo Samuel Castro
4 months
thrilled to finally put out the second release from my musical project "the 45s" with my buddy JS Diallo!. (links in thread)
Tweet media one
1
1
15
@pcastr
Pablo Samuel Castro
6 hours
RT @MichaelD1729: Glad to see more work getting RL to maintain plasticity in non-stationary PCG levels! It's been a folk theory for a while….
0
2
0
@pcastr
Pablo Samuel Castro
9 hours
Plasticity loss makes RL hard in continual settings, & it turns out that churn is a salient cause. C-Chain explicitly aims to reduce churn via regularization which aids in maintaining plasticity. See @tanghyyy 's thread on our #icml2025 paper, and come chat with us in Vancouver!.
@tanghyyy
Hongyao Tang
12 hours
(1/8)🔥Excited to share that our paper “Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn” has been accepted to #ICML2025!🎉. RL agents struggle to adapt in continual learning. Why? We trace the problem to something subtle: churn. 👇🧵@Mila_Quebec
Tweet media one
Tweet media two
1
1
14
@pcastr
Pablo Samuel Castro
6 days
B/f joining Brain in 2017 I interviewed with a startup doing smthg similar (w/ lots more manual effort). In intrvw I noticed a sleep pod & said "we have nap pods at Google also". They said "yeah, we regularly spend the full night at office". I got the offer, but said no 😊.
@AnthropicAI
Anthropic
6 days
We all know vending machines are automated, but what if we allowed an AI to run the entire business: setting prices, ordering inventory, responding to customer requests, and so on?. In collaboration with @andonlabs, we did just that. Read the post:
Tweet media one
0
0
14
@pcastr
Pablo Samuel Castro
10 days
whether you're already doing research in this space or are new to it, i think this survey will prove useful, so give it a read!.4/.
1
1
6
@pcastr
Pablo Samuel Castro
10 days
we received a lot of good feedback on the clarity of the presentation, in particular the images, which ayoub did a great job with; i particularly really appreciate the consistency he maintained throughout them. 3/
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
3
@pcastr
Pablo Samuel Castro
10 days
while state representation learning in RL seems like a fairly narrow topic, we still had to limit ourselves to methods meant for model-free, online agents. within this framing, ayoub came up with a tidy taxonomy of approaches:.2/
Tweet media one
1
0
2
@pcastr
Pablo Samuel Castro
10 days
proud to share a survey of state representation learning in RL that my student ayoub echchahed and i prepared, that was just published on @TmlrPub !.this was the bulk of ayoub's masters thesis and he put a lot of work and care into it!.a few details in thread below. 1/
Tweet media one
@TmlrCert
Certified papers at TMLR
10 days
New #SurveyCertification:. A Survey of State Representation Learning for Deep Reinforcement Learning. Ayoub Echchahed, Pablo Samuel Castro. #reinforcement #representations #representation.
3
15
121
@pcastr
Pablo Samuel Castro
10 days
RT @johanobandoc: 🚨 Excite to share "Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning" work. 🥳. We tackle gradi….
0
17
0
@pcastr
Pablo Samuel Castro
11 days
really excited about this new work we just put out, led by my students @creus_roger & @johanobandoc , where we examine the challenges of gradient propagation when scaling deep RL networks. roger & johan put in a lot of work and care in this work, check out more details in 🧵👇🏾 !.
@creus_roger
Roger Creus Castanyer
11 days
🚨 Excited to share our new work: "Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning"! 📈. We propose gradient interventions that enable stable, scalable learning, achieving significant performance gains across agents and environments!. Details below 👇
Tweet media one
0
10
85
@pcastr
Pablo Samuel Castro
14 days
thrilled that we'll be presenting this paper as a spotlight at #ICML2025 . come by our poster in vancouver to chat with us about the use of LLMs for advancing neuroscience!. here's the camera-ready version:.
@pcastr
Pablo Samuel Castro
5 months
Can LLMs be used to discover interpretable models of human and animal behavior?🤔. Turns out: yes!. Thrilled to share our latest preprint where we used FunSearch to automatically discover symbolic cognitive models of behavior. 1/12
Tweet media one
5
16
76
@pcastr
Pablo Samuel Castro
17 days
I will continue to proudly say I never use LLMs. Not for writing, not for coding, and certainly not for "art". Just haven't felt the need at all.
@rohanpaul_ai
Rohan Paul
17 days
It’s a hefty 206-page research paper, and the findings are concerning. "LLM users consistently underperformed at neural, linguistic, and behavioral levels". This study finds LLM dependence weakens the writer’s own neural and linguistic fingerprints. 🤔🤔. Relying only on EEG,
Tweet media one
0
4
47
@pcastr
Pablo Samuel Castro
28 days
Super proud of both @WalterMayor_T & @johanobandoc . This is Walter's 1st major publication as 1st author, & Johan acted as a mentor to him. I did something similar with Johan 4 years ago: . Hoping to continue spreading this type of mentorship!.12/
Tweet media one
Tweet media two
@pcastr
Pablo Samuel Castro
4 years
Happy to share our #ICML2021 "Revisiting Rainbow" paper, w/ @JS_Obando !.We argue for small- to mid-scale envs in deep RL for increasing scientific insight & inclusivity. 📜Paper: ✍️🏾Blog: 🐍Code: Thread 1/🧵
1
1
16
@pcastr
Pablo Samuel Castro
28 days
Read the paper here: And if you'll be in Vancouver for #ICML2025 next month, come chat with us at our poster!. 11/.
1
1
3
@pcastr
Pablo Samuel Castro
28 days
In brief, we find on-policy agents like PPO can greatly benefit from larger batch sizes, especially when increasing the number of parallel environments. One interesting avenue for future work is to adjust N_{RO} and N_{envs} dynamically throughout training. 10/.
1
0
2
@pcastr
Pablo Samuel Castro
28 days
PQN, a recently introduced value-based method ( has a similar data-collection as PPO. Although we see a similar trend as with PPO, but much less pronounced. It is possible our findings are more correlated with policy-based methods. 9/
Tweet media one
@MatteoGallici
Matteo Gallici
1 year
🚀 We're very excited to introduce Parallelised Q-Network (PQN), the result of an effort to bring Q-Learning into the world of pure-GPU training based on JAX!. What’s the issue? Pure-GPU training can accelerate RL by orders of magnitude. However, Q-Learning heavily relies on
Tweet media one
1
0
2
@pcastr
Pablo Samuel Castro
28 days
We see similar results when evaluated on other benchmarks, namely IsaacGym and ProcGen, suggesting that our findings generalize beyond Atari results. 8/
Tweet media one
1
0
2
@pcastr
Pablo Samuel Castro
28 days
We also observe substantial gains in performance when using decoupled architectures, which can often lead to sub-par performance. This insight opens the door for more exploration in asymmetric architectures, for example, see .7/
Tweet media one
@pcastr
Pablo Samuel Castro
1 month
📢optimistic critics can empower small actors📢. new @RL_Conference paper led by my students olya and dhruv!.we study actor-critic agents with asymmetric setups and find that, while smaller actors have advantages, they can degrade performance and result in overfit critics. 1/
Tweet media one
1
0
3
@pcastr
Pablo Samuel Castro
28 days
What about training dynamics? We consistently find that increasing the number of environments not only leads to improved performance, but also improves learning dynamics as measured by feature rank, neuron dormancy, and overall optimization stability. 6/
Tweet media one
Tweet media two
Tweet media three
1
0
3
@pcastr
Pablo Samuel Castro
28 days
These findings lead us to our first insight: under a fixed data budget, prefer to increase environments over rollout length!.5/
Tweet media one
1
0
4
@pcastr
Pablo Samuel Castro
28 days
Under a fixed data budget, having more environments also leads to better state coverage (as seen in coverage visualizations), which helps with performance and learning dynamics in general. This is somewhat expected, but is likely affected by the initial state distribution. 4/
Tweet media one
Tweet media two
1
0
3