Anirudh Buvanesh @AnirudhBuvanesh X Profile

Anirudh Buvanesh

@AnirudhBuvanesh

Followers

69

Following

33

Media

1

Statuses

18

Ph.D. student @Mila_Quebec. Ex @MSFTResearch, @salesforce

Montreal, Quebec

Joined October 2022

Don't wanna be here? Send us removal request.

Jatin Prakash

@bicycleman15

3 days

New paper alert 🚨 What if I told you there is an architecture that provides a _knob_ to control quality-efficiency trade-offs directly at test-time? Introducing Compress & Attend Transformers (CATs) that provide you exactly this! 🧵(1/n) 👇

1

10

21

Johan Obando-Ceron 👍🏽

@johanobandoc

24 days

1/3 🥳Excited to share our new paper ‘Simplicial Embeddings Improve Sample Efficiency in Actor–Critic Agents’! Project your features onto a product of simplices → sparse, stable reps, stronger grads, faster learning. 🧵For more details, check out Pablo’s thread 👇

Pablo Samuel Castro

@pcastr

24 days

🔊Simplicial Embeddings (SEMs) Improve Sample Efficiency in Actor-Critic Agents🔊 In our recent preprint we demonstrate that the use of well-structured representations (SEMs) can dramatically improve sample efficiency in RL agents. 1/X

2

14

43

Milad Aghajohari

@MAghajohari

1 month

Introducing linear scaling of reasoning: 𝐓𝐡𝐞 𝐌𝐚𝐫𝐤𝐨𝐯𝐢𝐚𝐧 𝐓𝐡𝐢𝐧𝐤𝐞𝐫 Reformulate RL so thinking scales 𝐎(𝐧) 𝐜𝐨𝐦𝐩𝐮𝐭𝐞, not O(n^2), with O(1) 𝐦𝐞𝐦𝐨𝐫𝐲, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy 🧵

14

203

922

Anirudh Buvanesh

@AnirudhBuvanesh

2 months

Work done with my amazing collaborator @bicycleman15 . Excited to hear your thoughts! (4/4)

1

6

Anirudh Buvanesh

@AnirudhBuvanesh

2 months

Which easy examples you add matters. Trivial ones don’t help much. But you don’t need to hunt for “perfect difficulty.” Mixing all the easier instances you have usually works fine. We’re releasing our hackable implementations at https://t.co/Mwge7elqnC. Check it out 🙂 (3/n)

github.com

RL reasoning baselines. Contribute to rl4reasoning/rl-baselines development by creating an account on GitHub.

1

2

7

Anirudh Buvanesh

@AnirudhBuvanesh

2 months

We test this on the graph-search task from Bachmann et al. (2024). Dense rewards, diversity incentives, and improved credit assignment all underperform in our setting when the base model fails to sample correct answers. Mixing in easier instances helps unlock RL training. (2/n)

1

6

Anirudh Buvanesh

@AnirudhBuvanesh

2 months

Zero rewards after tons of RL training? 😞 Before using dense rewards or incentivizing exploration, try changing the data. Adding easier instances of the task can unlock RL training. 🔓📈To know more checkout our blog post here: https://t.co/BPErVcLmP8. Keep reading 🧵(1/n)

spiffy-airbus-472.notion.site

Jatin Prakash* (NYU), Anirudh Buvanesh* (MILA) (* order decided through np.random.randint(2))

2

31

101

Ankur Sikarwar

@sikarwar_ank

3 months

Thrilled to share our new work EARL 🚀 1⃣ An AR + RL image editing model that outperforms diffusion baselines w/ 5x less data. 2⃣ First systematic SFT vs RL study in image editing → RL post-training shines on complex edits where paired data is scarce. See thread for details👇

Saba

@Saba_A96

3 months

We built a new 𝗮𝘂𝘁𝗼𝗿𝗲𝗴𝗿𝗲𝘀𝘀𝗶𝘃𝗲 + 𝗥𝗟 image editing model using a strong verifier — and it beats SOTA diffusion baselines using 5× less data. 🔥 𝗘𝗔𝗥𝗟: a simple, scalable RL pipeline for high-quality, controllable edits. 🧵1/

0

3

8

Laurent Charlin

@lcharlin

1 year

Introducing a framework for end-to-end discovery of data structures—no predefined algorithms or hand-tuning needed. Work led by Omar Salemohamed. More details below. https://t.co/lFb2kn2NpE

1

8

17