Dane Malenfant @dvnxmvl_hdf5 X Profile

Dane Malenfant

@dvnxmvl_hdf5

Followers

127

Following

1K

Media

33

Statuses

247

MSc. Computer Science @Mila_Quebec & @mcgillu in the LiNC lab | Currently distracted with multi-agent RL and neuroAI | Restless | Ēka ē-akimiht

https://t.co/0V3IeEiQ4b

Montréal, Québec

Joined December 2023

Don't wanna be here? Send us removal request.

Dane Malenfant

@dvnxmvl_hdf5

1 month

I am presenting this work at the @cocomarl2024 workshop part of @RL_Conference Tuesday (: I additionally have a generalized correction term for n-arbitrary agents (it is like walking a tree for the order of gradients) that I am looking for thoughts, validations or critiques.

Dane Malenfant

@dvnxmvl_hdf5

3 months

Preprint Alert 🚀 Multi-agent reinforcement learning (MARL) often assumes that agents know when other agents cooperate with them. But for humans, this isn’t always true. Example, plains indigenous groups used to leave resources for others to use at effigies called Manitokan. 1/8

2

0

10

will brown

@willccbb

5 days

there are equilibria everywhere for those with the eyes to see

9

6

107

Shayan Sardarizadeh

@Shayan86

5 days

Conspiracy theorist Romana Didulo, the self-proclaimed QAnon "Queen of Canada", has been arrested by police in a village, where she's been leading a cult of followers for two years. https://t.co/krdxDXuA7K

cbc.ca

For the past two years a former school in the Richmound, Sask., has been the home of Romana Didulo and her followers.

14

106

323

Dane Malenfant

@dvnxmvl_hdf5

10 days

Nvm, confused my self with a finite difference implementation

0

Dane Malenfant

@dvnxmvl_hdf5

11 days

Are there any works on the approximation error for pearlmutter's trick? especially for higher order gradients like the 5th. If so are there better approximations?

1

0

1

Dane Malenfant

@dvnxmvl_hdf5

11 days

ok i accept the Mile End is gentrified now that there is a Maxi

0

2

Zichen Liu

@zzlccc

1 month

In the era of experience, we're training LLM agents with RL — but something's missing... We miss the good old Gym! So we built 💎GEM: a suite of environments for training LLM 𝚐𝚎𝚗𝚎𝚛𝚊𝚕𝚒𝚜𝚝𝚜. Let’s build the Gym for LLMs, together: https://t.co/QJLRTm2zZA

5

35

274

Benjamin Thérien

@benjamintherien

18 days

What if I told you how to outperform DiLoCo while only communicating 1-3% of the pseudogradient? https://t.co/ltNtZjxpfn

Amir Sarfi

@amir_sarfi

18 days

Introducing SparseLoCo: a communication-efficient method for LLM pre-training. TL;DR: We leverage Top-k sparsification + error feedback with DiLoCo’s infrequent outer steps—communicating only 1–3% gradients with 2-bit quantization—outperforming DiLoCo and DeMo. 1/N, ArXiv:

0

5

23

Dane Malenfant

@dvnxmvl_hdf5

19 days

ฅ^•ﻌ•^ฅ

0

Dane Malenfant

@dvnxmvl_hdf5

20 days

The “proof” in one page to this thread

Dane Malenfant

@dvnxmvl_hdf5

24 days

To begin to see the generalized correction term for n-arbitrary agents, the self-correction term permits an expected symmetry between agents allowing decentralized parameters.

0

2

Eric Elmoznino

@EricElmoznino

21 days

Very excited to release a new blog post that formalizes what it means for data to be compositional, and shows how compositionality can exist at multiple scales. Early days, but I think there may be significant implications for AI. Check it out!

ericelmoznino.github.io

What is compositionality? For those of us working in AI or cognitive neuroscience this question can appear easy at first, but becomes increasingly perplexing the more we think about it. We aren’t...

0

9

29

TalkRL Podcast

@TalkRLPodcast

24 days

E69: Outstanding Paper Award Winners 1/2 @RL_Conference 2025 @AlexDGoldie : How Should We Meta-Learn Reinforcement Learning Algorithms? @RyanSullyvan : Syllabus: Portable Curricula for Reinforcement Learning Agents @jsuarez5341 : PufferLib 2.0: Reinforcement Learning at 1M

1

2

38

Dane Malenfant

@dvnxmvl_hdf5

24 days

But with self-correction, not all combinations of agents need to be calculated. Only the coeffecients for each level of the tree yielding O(log V) complexity

0

2

Dane Malenfant

@dvnxmvl_hdf5

24 days

Without self-correction this is like walking through an n-ary tree where the highest order gradient is at the root

1

0

1

Dane Malenfant

@dvnxmvl_hdf5

24 days

So the full update is:

1

0

1

Dane Malenfant

@dvnxmvl_hdf5

24 days

Then since the sum converges to a gradient operator distribution with the identity operator as I.

1

0

1

Dane Malenfant

@dvnxmvl_hdf5

24 days

Now let f define the correction term function for clarity

1

0

1

Dane Malenfant

@dvnxmvl_hdf5

24 days

This would continue with more and more agents and is neatly a binomial distribution of the order of gradients

1

0

1

Dane Malenfant

@dvnxmvl_hdf5

24 days

These objectives will cleanly come out in the global optimization but since there 2 ways to leave the key with 1 agent and 1 way to leave the key with all 3 agents out of three agents, we include a coefficient infront of the second term.