
Dane Malenfant
@dvnxmvl_hdf5
Followers
127
Following
1K
Media
33
Statuses
247
MSc. Computer Science @Mila_Quebec & @mcgillu in the LiNC lab | Currently distracted with multi-agent RL and neuroAI | Restless | Ēka ē-akimiht
Montréal, Québec
Joined December 2023
I am presenting this work at the @cocomarl2024 workshop part of @RL_Conference Tuesday (: I additionally have a generalized correction term for n-arbitrary agents (it is like walking a tree for the order of gradients) that I am looking for thoughts, validations or critiques.
Preprint Alert 🚀 Multi-agent reinforcement learning (MARL) often assumes that agents know when other agents cooperate with them. But for humans, this isn’t always true. Example, plains indigenous groups used to leave resources for others to use at effigies called Manitokan. 1/8
2
0
10
Conspiracy theorist Romana Didulo, the self-proclaimed QAnon "Queen of Canada", has been arrested by police in a village, where she's been leading a cult of followers for two years. https://t.co/krdxDXuA7K
cbc.ca
For the past two years a former school in the Richmound, Sask., has been the home of Romana Didulo and her followers.
14
106
323
Nvm, confused my self with a finite difference implementation
0
0
0
Are there any works on the approximation error for pearlmutter's trick? especially for higher order gradients like the 5th. If so are there better approximations?
1
0
1
ok i accept the Mile End is gentrified now that there is a Maxi
0
0
2
In the era of experience, we're training LLM agents with RL — but something's missing... We miss the good old Gym! So we built 💎GEM: a suite of environments for training LLM 𝚐𝚎𝚗𝚎𝚛𝚊𝚕𝚒𝚜𝚝𝚜. Let’s build the Gym for LLMs, together: https://t.co/QJLRTm2zZA
5
35
274
What if I told you how to outperform DiLoCo while only communicating 1-3% of the pseudogradient? https://t.co/ltNtZjxpfn
Introducing SparseLoCo: a communication-efficient method for LLM pre-training. TL;DR: We leverage Top-k sparsification + error feedback with DiLoCo’s infrequent outer steps—communicating only 1–3% gradients with 2-bit quantization—outperforming DiLoCo and DeMo. 1/N, ArXiv:
0
5
23
Very excited to release a new blog post that formalizes what it means for data to be compositional, and shows how compositionality can exist at multiple scales. Early days, but I think there may be significant implications for AI. Check it out!
ericelmoznino.github.io
What is compositionality? For those of us working in AI or cognitive neuroscience this question can appear easy at first, but becomes increasingly perplexing the more we think about it. We aren’t...
0
9
29
E69: Outstanding Paper Award Winners 1/2 @RL_Conference 2025 @AlexDGoldie : How Should We Meta-Learn Reinforcement Learning Algorithms? @RyanSullyvan : Syllabus: Portable Curricula for Reinforcement Learning Agents @jsuarez5341 : PufferLib 2.0: Reinforcement Learning at 1M
1
2
38
But with self-correction, not all combinations of agents need to be calculated. Only the coeffecients for each level of the tree yielding O(log V) complexity
0
0
2
Without self-correction this is like walking through an n-ary tree where the highest order gradient is at the root
1
0
1
Then since the sum converges to a gradient operator distribution with the identity operator as I.
1
0
1
Now let f define the correction term function for clarity
1
0
1
This would continue with more and more agents and is neatly a binomial distribution of the order of gradients
1
0
1
These objectives will cleanly come out in the global optimization but since there 2 ways to leave the key with 1 agent and 1 way to leave the key with all 3 agents out of three agents, we include a coefficient infront of the second term.
1
0
1
Then with policy independence (there isn't a better action to take since success now requires another agent), we have another correction term.
1
0
1
Using agent k's value approximation as a surrogate for the expected collection reward is the same as before but leads to a higher order gradient.
1
0
1