dvnxmvl_hdf5 Profile Banner
Dane Malenfant Profile
Dane Malenfant

@dvnxmvl_hdf5

Followers
127
Following
1K
Media
33
Statuses
247

MSc. Computer Science @Mila_Quebec & @mcgillu in the LiNC lab | Currently distracted with multi-agent RL and neuroAI | Restless | Ēka ē-akimiht

Montréal, Québec
Joined December 2023
Don't wanna be here? Send us removal request.
@dvnxmvl_hdf5
Dane Malenfant
1 month
I am presenting this work at the @cocomarl2024 workshop part of @RL_Conference Tuesday (: I additionally have a generalized correction term for n-arbitrary agents (it is like walking a tree for the order of gradients) that I am looking for thoughts, validations or critiques.
@dvnxmvl_hdf5
Dane Malenfant
3 months
Preprint Alert 🚀 Multi-agent reinforcement learning (MARL) often assumes that agents know when other agents cooperate with them. But for humans, this isn’t always true. Example, plains indigenous groups used to leave resources for others to use at effigies called Manitokan. 1/8
Tweet media one
2
0
10
@willccbb
will brown
5 days
there are equilibria everywhere for those with the eyes to see
9
6
107
@Shayan86
Shayan Sardarizadeh
5 days
Conspiracy theorist Romana Didulo, the self-proclaimed QAnon "Queen of Canada", has been arrested by police in a village, where she's been leading a cult of followers for two years. https://t.co/krdxDXuA7K
Tweet card summary image
cbc.ca
For the past two years a former school in the Richmound, Sask., has been the home of Romana Didulo and her followers.
14
106
323
@dvnxmvl_hdf5
Dane Malenfant
10 days
Nvm, confused my self with a finite difference implementation
0
0
0
@dvnxmvl_hdf5
Dane Malenfant
11 days
Are there any works on the approximation error for pearlmutter's trick? especially for higher order gradients like the 5th. If so are there better approximations?
1
0
1
@dvnxmvl_hdf5
Dane Malenfant
11 days
ok i accept the Mile End is gentrified now that there is a Maxi
0
0
2
@zzlccc
Zichen Liu
1 month
In the era of experience, we're training LLM agents with RL — but something's missing... We miss the good old Gym! So we built 💎GEM: a suite of environments for training LLM 𝚐𝚎𝚗𝚎𝚛𝚊𝚕𝚒𝚜𝚝𝚜. Let’s build the Gym for LLMs, together: https://t.co/QJLRTm2zZA
Tweet media one
5
35
274
@benjamintherien
Benjamin Thérien
18 days
What if I told you how to outperform DiLoCo while only communicating 1-3% of the pseudogradient? https://t.co/ltNtZjxpfn
@amir_sarfi
Amir Sarfi
18 days
Introducing SparseLoCo: a communication-efficient method for LLM pre-training. TL;DR: We leverage Top-k sparsification + error feedback with DiLoCo’s infrequent outer steps—communicating only 1–3% gradients with 2-bit quantization—outperforming DiLoCo and DeMo. 1/N, ArXiv:
Tweet media one
0
5
23
@dvnxmvl_hdf5
Dane Malenfant
19 days
ฅ^•ﻌ•^ฅ
Tweet media one
0
0
0
@dvnxmvl_hdf5
Dane Malenfant
20 days
The “proof” in one page to this thread
Tweet media one
@dvnxmvl_hdf5
Dane Malenfant
24 days
To begin to see the generalized correction term for n-arbitrary agents, the self-correction term permits an expected symmetry between agents allowing decentralized parameters.
Tweet media one
0
0
2
@EricElmoznino
Eric Elmoznino
21 days
Very excited to release a new blog post that formalizes what it means for data to be compositional, and shows how compositionality can exist at multiple scales. Early days, but I think there may be significant implications for AI. Check it out!
ericelmoznino.github.io
What is compositionality? For those of us working in AI or cognitive neuroscience this question can appear easy at first, but becomes increasingly perplexing the more we think about it. We aren’t...
0
9
29
@TalkRLPodcast
TalkRL Podcast
24 days
E69: Outstanding Paper Award Winners 1/2 @RL_Conference 2025 @AlexDGoldie : How Should We Meta-Learn Reinforcement Learning Algorithms? @RyanSullyvan : Syllabus: Portable Curricula for Reinforcement Learning Agents @jsuarez5341 : PufferLib 2.0: Reinforcement Learning at 1M
1
2
38
@dvnxmvl_hdf5
Dane Malenfant
24 days
But with self-correction, not all combinations of agents need to be calculated. Only the coeffecients for each level of the tree yielding O(log V) complexity
Tweet media one
0
0
2
@dvnxmvl_hdf5
Dane Malenfant
24 days
Without self-correction this is like walking through an n-ary tree where the highest order gradient is at the root
Tweet media one
1
0
1
@dvnxmvl_hdf5
Dane Malenfant
24 days
So the full update is:
Tweet media one
1
0
1
@dvnxmvl_hdf5
Dane Malenfant
24 days
Then since the sum converges to a gradient operator distribution with the identity operator as I.
Tweet media one
1
0
1
@dvnxmvl_hdf5
Dane Malenfant
24 days
Now let f define the correction term function for clarity
Tweet media one
1
0
1
@dvnxmvl_hdf5
Dane Malenfant
24 days
This would continue with more and more agents and is neatly a binomial distribution of the order of gradients
Tweet media one
1
0
1
@dvnxmvl_hdf5
Dane Malenfant
24 days
These objectives will cleanly come out in the global optimization but since there 2 ways to leave the key with 1 agent and 1 way to leave the key with all 3 agents out of three agents, we include a coefficient infront of the second term.
Tweet media one
1
0
1
@dvnxmvl_hdf5
Dane Malenfant
24 days
Then with policy independence (there isn't a better action to take since success now requires another agent), we have another correction term.
Tweet media one
1
0
1
@dvnxmvl_hdf5
Dane Malenfant
24 days
Using agent k's value approximation as a surrogate for the expected collection reward is the same as before but leads to a higher order gradient.
Tweet media one
1
0
1