Anirudh Buvanesh Profile
Anirudh Buvanesh

@AnirudhBuvanesh

Followers
69
Following
33
Media
1
Statuses
18

Ph.D. student @Mila_Quebec. Ex @MSFTResearch, @salesforce

Montreal, Quebec
Joined October 2022
Don't wanna be here? Send us removal request.
@bicycleman15
Jatin Prakash
3 days
New paper alert ๐Ÿšจ What if I told you there is an architecture that provides a _knob_ to control quality-efficiency trade-offs directly at test-time? Introducing Compress & Attend Transformers (CATs) that provide you exactly this! ๐Ÿงต(1/n) ๐Ÿ‘‡
1
10
21
@johanobandoc
Johan Obando-Ceron ๐Ÿ‘๐Ÿฝ
24 days
1/3 ๐ŸฅณExcited to share our new paper โ€˜Simplicial Embeddings Improve Sample Efficiency in Actorโ€“Critic Agentsโ€™! Project your features onto a product of simplices โ†’ sparse, stable reps, stronger grads, faster learning. ๐ŸงตFor more details, check out Pabloโ€™s thread ๐Ÿ‘‡
@pcastr
Pablo Samuel Castro
24 days
๐Ÿ”ŠSimplicial Embeddings (SEMs) Improve Sample Efficiency in Actor-Critic Agents๐Ÿ”Š In our recent preprint we demonstrate that the use of well-structured representations (SEMs) can dramatically improve sample efficiency in RL agents. 1/X
2
14
43
@MAghajohari
Milad Aghajohari
1 month
Introducing linear scaling of reasoning: ๐“๐ก๐ž ๐Œ๐š๐ซ๐ค๐จ๐ฏ๐ข๐š๐ง ๐“๐ก๐ข๐ง๐ค๐ž๐ซ Reformulate RL so thinking scales ๐Ž(๐ง) ๐œ๐จ๐ฆ๐ฉ๐ฎ๐ญ๐ž, not O(n^2), with O(1) ๐ฆ๐ž๐ฆ๐จ๐ซ๐ฒ, architecture-agnostic. Train R1-1.5B into a markovian thinker with 96K thought budget, ~2X accuracy ๐Ÿงต
14
203
922
@AnirudhBuvanesh
Anirudh Buvanesh
2 months
Work done with my amazing collaborator @bicycleman15 . Excited to hear your thoughts! (4/4)
1
1
6
@AnirudhBuvanesh
Anirudh Buvanesh
2 months
Which easy examples you add matters. Trivial ones donโ€™t help much. But you donโ€™t need to hunt for โ€œperfect difficulty.โ€ Mixing all the easier instances you have usually works fine. Weโ€™re releasing our hackable implementations at https://t.co/Mwge7elqnC. Check it out ๐Ÿ™‚ (3/n)
Tweet card summary image
github.com
RL reasoning baselines. Contribute to rl4reasoning/rl-baselines development by creating an account on GitHub.
1
2
7
@AnirudhBuvanesh
Anirudh Buvanesh
2 months
We test this on the graph-search task from Bachmann et al. (2024). Dense rewards, diversity incentives, and improved credit assignment all underperform in our setting when the base model fails to sample correct answers. Mixing in easier instances helps unlock RL training. (2/n)
1
1
6
@AnirudhBuvanesh
Anirudh Buvanesh
2 months
Zero rewards after tons of RL training? ๐Ÿ˜ž Before using dense rewards or incentivizing exploration, try changing the data. Adding easier instances of the task can unlock RL training. ๐Ÿ”“๐Ÿ“ˆTo know more checkout our blog post here: https://t.co/BPErVcLmP8. Keep reading ๐Ÿงต(1/n)
Tweet card summary image
spiffy-airbus-472.notion.site
Jatin Prakash* (NYU), Anirudh Buvanesh* (MILA) (* order decided through np.random.randint(2))
2
31
101
@sikarwar_ank
Ankur Sikarwar
3 months
Thrilled to share our new work EARL ๐Ÿš€ 1โƒฃ An AR + RL image editing model that outperforms diffusion baselines w/ 5x less data. 2โƒฃ First systematic SFT vs RL study in image editing โ†’ RL post-training shines on complex edits where paired data is scarce. See thread for details๐Ÿ‘‡
@Saba_A96
Saba
3 months
We built a new ๐—ฎ๐˜‚๐˜๐—ผ๐—ฟ๐—ฒ๐—ด๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐˜ƒ๐—ฒ + ๐—ฅ๐—Ÿ image editing model using a strong verifier โ€” and it beats SOTA diffusion baselines using 5ร— less data. ๐Ÿ”ฅ ๐—˜๐—”๐—ฅ๐—Ÿ: a simple, scalable RL pipeline for high-quality, controllable edits. ๐Ÿงต1/
0
3
8
@lcharlin
Laurent Charlin
1 year
Introducing a framework for end-to-end discovery of data structuresโ€”no predefined algorithms or hand-tuning needed. Work led by Omar Salemohamed. More details below. https://t.co/lFb2kn2NpE
1
8
17