Rishabh Agarwal @agarwl_ X Profile

Rishabh Agarwal

@agarwl_

Followers

11K

Following

8K

Media

135

Statuses

1K

Reinforcement Learner @AIatMeta, Adjunct Prof at McGill. Ex DeepMind, Brain, Mila, IIT Bombay. NeurIPS Best Paper

Montréal, Canada

Joined May 2016

Don't wanna be here? Send us removal request.

Rishabh Agarwal

@agarwl_

6 months

I recently gave a tutorial on knowledge distillation for LLMs, explaining the mathematical derivations behind the commonly used methods. Sharing the slides here given the recent interest in this topic.

18

180

1K

Rishabh Agarwal

@agarwl_

18 days

RT @GoogleDeepMind: An advanced version of Gemini with Deep Think has officially achieved gold medal-level performance at the International….

0

784

0

Rishabh Agarwal

@agarwl_

27 days

RT @yong_zhengxin: I wrote up this post about how we should **unify RL and next-token-prediction** based on my perspective how humans learn….

0

50

0

Rishabh Agarwal

@agarwl_

27 days

RT @RotekSong: Kimi K2 is here! The first big beautiful model purpose-built for agentic capabilities is now open-source! Agent RL, ready fo….

0

18

0

Rishabh Agarwal

@agarwl_

1 month

RT @jacobmbuckman: The age of transformers is ending. the dawn of linear-cost architectures is upon us. Power Attention replaces Flash Att….

0

17

0

Rishabh Agarwal

@agarwl_

1 month

RT @george__wing: Great talk. Thanks @shuchaobi for delivering it, and @CUSEAS for uploading it.

0

9

0

Rishabh Agarwal

@agarwl_

1 month

RT @Happylemon56775: Excited to share what I worked on during my time at Meta. - We introduce a Triton-accelerated Transformer with *2-sim….

0

96

0

Rishabh Agarwal

@agarwl_

1 month

RT @AleksandraFaust: Join my team at @genesistxai ! 🧬 We're forging AI foundation models to unlock groundbreaking therapies for patients wi….

0

9

0

Rishabh Agarwal

@agarwl_

1 month

RT @setlur_amrith: Since R1 there has been a lot of chatter 💬 on post-training LLMs with RL. Is RL only sharpening the distribution over co….

pinnate-flare-8f3.notion.site

Amrith Setlur and Aviral Kumar, Carnegie Mellon University

0

28

0

Rishabh Agarwal

@agarwl_

2 months

RT @MiniMax__AI: Day 1/5 of #MiniMaxWeek: We’re open-sourcing MiniMax-M1, our latest LLM — setting new standards in long-context reasoning.….

0

307

0

Rishabh Agarwal

@agarwl_

2 months

RT @setlur_amrith: Introducing e3 🔥 Best <2B model on math 💪.Are LLMs implementing algos ⚒️ OR is thinking an illusion 🎩.? Is RL only sharp….

0

26

0

Rishabh Agarwal

@agarwl_

2 months

RT @jacobandreas: 👉 New preprint on a new family of Transformer-type models whose depth scales logarithmically with sequence length. Enable….

0

10

0

Rishabh Agarwal

@agarwl_

2 months

RT @_arohan_: Last day today @AIatMeta, reflecting on last several months, and wanted to highlight few things I enjoyed working with:. Bui….

0

7

0

Rishabh Agarwal

@agarwl_

2 months

RT @KhurramJaved_96: Learning to play Atari from pixels from scratch in 30 minutes, all locally on an Apple Watch!

0

11

0

Rishabh Agarwal

@agarwl_

2 months

Slides here for my CVPR talk: @anoopcherian will probably know about the recording

3

33

328

Rishabh Agarwal

@agarwl_

2 months

RT @scaling01: A few more observations after replicating the Tower of Hanoi game with their exact prompts:. - You need AT LEAST 2^N - 1 mov….

0

255

0

Rishabh Agarwal

@agarwl_

2 months

RT @InfiniAILab: 🥳 Happy to share our new work – Kinetics: Rethinking Test-Time Scaling Laws. 🤔How to effectively build a powerful reasoni….

0

70

0

Rishabh Agarwal

@agarwl_

2 months

Good take -- it's a good benchmark to develop better training algorithms / inference time scaling, which you can validate on other domains. Random / incorrect rewards won't work on this one . Main gotcha is to not overfit to just ARC- like puzzles.

will brown

@willccbb

2 months

people stopped working on ARC-AGI because they realized it was too hard.

3

5

73

Rishabh Agarwal

@agarwl_

2 months

Giving my first ever invited talk at @CVPR , during the multimodal reasoning workshop:. The Bitter Lesson for RL: Verification as the Key to Reasoning LLMs. This talk is inspired by the two classic essays from Rich Sutton: