gab_p_andrade Profile Banner
Gabriel P. Andrade Profile
Gabriel P. Andrade

@gab_p_andrade

Followers
250
Following
84
Media
4
Statuses
38

Researcher @GensynAI. Working on multi-agent RL, game theory, alg econ, and decentralized learning.

Joined April 2025
Don't wanna be here? Send us removal request.
@gab_p_andrade
Gabriel P. Andrade
3 days
I've always been a sucker for reductions between one learning setting to another. Just feels elegant to know insights in one will trickle over to the other "for free"
@damekdavis
Damek
4 days
In this note w/ @beenwrekt we look at RL problems with 0/1 rewards, showing that popular methods maximize the average (transformed) probability of correctly answering a prompt x: max_θ 𝔼ₓ h(Prob(correct ∣ x; θ)) for certain functions h. Weirdly, h is arcsin(√t) in GRPO.
0
0
4
@tareknaous
Tarek Naous
9 days
Simulating user–AI conversations helps us understand how LMs work in multi-turn settings. Prompting LMs like GPT-4o to simulate users is common, but their assistant nature makes it hard to replicate user behavior. We introduce User LMs - trained to be users, not assistants.
2
27
146
@sumeetrm
Sumeet Motwani
10 days
🚨How do we improve long-horizon reasoning capabilities by scaling RL with only existing data? Introducing our new paper: "h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning"🫡 > RL on existing datasets saturates very quickly > Reasoning over
10
47
279
@canondetortugas
Dylan Foster 🐢
8 days
New paper we're excited to get online! Taming Imperfect Process Verifiers: A Sampling Perspective on Backtracking. A totally new framework based on ~backtracking~ for using process verifiers to guide inference, w/ connections to approximate counting/sampling in theoretical CS.
8
41
245
@sirbayes
Kevin Patrick Murphy
11 days
For more details (including how to train an RNN-based policy using PPO inside the "imagination" of the learned CWM), please see our paper: https://t.co/L9N7CeAB9s Joint work with @WLehrach, Daniel Hennes, @lazarox8, @heroesneverdie, Carter Wendelken, Zun Li, @antoine_dedieu,
0
8
37
@SplezzzK
Zang
23 days
🌟 Excited to share that our paper, “From Self-Check to Consensus: Bayesian Strategic Decoding in Large Language Models”, has been accepted by #NeurIPS2025! Huge thanks to my coauthors @BernhardKainz1 and Weitong Zhang on this wonderful work!
11
8
43
@connordavis_ai
Connor Davis
26 days
Multi-agent AI is a $50B lie. 99% of "multi-agent" systems are just single agents with fancy marketing. I just read the paper that exposes what real multi-agent intelligence actually looks like. Most people think multi-agent AI is just "multiple ChatGPTs in a room. That's
65
209
1K
@nanjiang_cs
Nan Jiang
28 days
I was surprised by how many didnt know that (1) per token MLE is whole seq MLE, and (2) PG at token level same as PG at seq level (optimizkng one big combinatorial action). story is different if you introduce fitted critic/Q-values or intermediate resets.
@NandoDF
Nando de Freitas
28 days
Most RL for LLMs involves only 1 step of RL. It’s a contextual bandit problem and there’s no covariate shift because the state (question, instruction) is given. This has many implications, eg DAgger becomes SFT, and it is trivial to design Expectation Maximisation (EM) maximum
9
39
362
@gab_p_andrade
Gabriel P. Andrade
1 month
Cool work laying rigorous foundations with practical insights for building well-aligned AI systems: Should we trust individual orgs will produce well-aligned AI that aren't self-serving? No. Let them compete, don't assume their altruism, and design mechanisms accordingly.
@Aaroth
Aaron Roth
1 month
Aligning an AI with human preferences might be hard. But there is more than one AI out there, and users can choose which to use. Can we get the benefits of a fully aligned AI without solving the alignment problem? In a new paper we study a setting in which the answer is yes.
0
1
1
@OkhayIea
Kaiyan Zhang
1 month
🚀 Excited to share our new survey paper on RL for Large Reasoning Models (LRMs)! Since early this year, our team has released several RL+LLMs works (PRIME, TTRL, SimpleVLA, MARTI, SSRL, HPT), covering dense rewards, self-evolution, embodied AI, multi-agent, tool learning, and
6
76
343
@gab_p_andrade
Gabriel P. Andrade
1 month
Thanks to all my wonderful collaborators and the awesome @gensynai community members who have contributed to our testnet! Your continued support makes it possible for us to iterate and experiment at unprecedented scales!
1
0
4
@gab_p_andrade
Gabriel P. Andrade
1 month
In our open source demo, thousands of @gensynai community members trained models on a range of models and devices. After approximately 175 training rounds, the performance of models in the swarm significantly outperform models in silo. Below, red == adjusted p-value > 0.05.
2
1
8
@gab_p_andrade
Gabriel P. Andrade
1 month
In controlled experiments, models trained with SAPO show ~94% improvement in cumulative reward over models trained in silo. We compared models trained with batches of X local vs Y swarm samples; there was a clear trend.
3
3
15
@gab_p_andrade
Gabriel P. Andrade
1 month
🐸 SAPO is a meta-algorithm that wraps around your preferred policy gradient algorithm → Generate rollouts on a local batch of data, share with + sample from the swarm, update your policy, repeat.
1
0
3
@gab_p_andrade
Gabriel P. Andrade
1 month
Problem: Scaling RL is non-trivial — it’s expensive, latency-sensitive, memory-intensive, and failure-sensitive 🐸 SAPO sidesteps these hurdles → fully decentralized, asynchronous, and designed for heterogenous devices + heterogenous models 🐸 SAPO is highly customizable →
1
0
2
@gab_p_andrade
Gabriel P. Andrade
1 month
TLDR • Heterogeneous devices & heterogeneous models collectively train • Models generate rollouts locally and share, then sample rollouts shared by others • With SAPO, models can train faster with less compute per node
1
0
5
@gab_p_andrade
Gabriel P. Andrade
1 month
Is the whole greater than the sum of its parts? In decentralized RL post-training the answer is YES. 🐸🐸🐸 Swarm sAmpling Policy Optimization (SAPO) 🐸🐸🐸
9
23
79
@gensynai
gensyn
1 month
Something new for the community 🐝 https://t.co/R1qM2UxDT1
88
45
374