Shashank Gupta @shashank27392 X Profile

Shashank Gupta

@shashank27392

Followers

1K

Following

51K

Media

98

Statuses

4K

PhD at @irlab_amsterdam | Prev. @AIatMeta (NYC '24, London '23), @Flipkart | Interested in ML & IR.

New York

Joined March 2015

Don't wanna be here? Send us removal request.

Shashank Gupta

@shashank27392

1 year

Interested in learning about the latest advancements in the area of Counterfactual Learning to Rank? Philipp Hager and I will present a tutorial on the topic.@WSDMSocial.next Monday @WSDMSocial Joint work with @_Jin_Huang_ @AliVardasbi @HarrieOos.

0

2

14

Shashank Gupta

@shashank27392

18 days

RT @CharlieLondon02: I believe that policy gradient methods with only terminal rewards will have to break down at some level of task ood-ne….

0

1

0

Shashank Gupta

@shashank27392

1 month

RT @y0b1byte: another good one!

0

43

0

Shashank Gupta

@shashank27392

1 month

RT @tianyuanzhang99: Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative?. Our new paper….

0

75

0

Shashank Gupta

@shashank27392

1 month

RT @SimonShaoleiDu: PPO vs. DPO? 🤔.Our new paper proves that it depends on whether your models can represent the optimal policy and/or rewa….

0

18

0

Shashank Gupta

@shashank27392

2 months

RT @abeirami: As we go through a lot of excitement about RL recently with lots of cool work/results, here is a reminder that RL with a reve….

0

52

0

Shashank Gupta

@shashank27392

2 months

RT @sirbayes: I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor upd….

0

452

0

Shashank Gupta

@shashank27392

2 months

RT @probnstat: Lectures on Unbiased Estimation by Lester Mackey.

0

40

0

Shashank Gupta

@shashank27392

2 months

RT @goyal__pramod: Gpt-2 is just 174 lines of code. How crazy is that

0

195

0

Shashank Gupta

@shashank27392

2 months

RT @Ji_Ha_Kim: I got recommended Terence Tao's YouTube channel created in 2010, where he uploaded his first video just yesterday!.He showca….

0

46

0

Shashank Gupta

@shashank27392

3 months

RT @bremen79: Found this paper in the RL slides of @alexolshevsky1:. Most policy gradient methods drop the discount….

0

31

0

Shashank Gupta

@shashank27392

3 months

RT @qingfeng_lan: 🚀RL algorithms are shaping the post-training of LLMs, but how do their objectives connect? In this blog, I explore their….

0

54

0

Shashank Gupta

@shashank27392

4 months

RT @leloykun: I'm not sure if someone has already pointed this out, but Dr. GRPO still has a bias that is more pronounced the smaller the g….

0

63

0

Shashank Gupta

@shashank27392

4 months

This simple yet effective approach leads improved fine-tuning of diffusion models on various downstream tasks. Preprint:

0

1

Shashank Gupta

@shashank27392

4 months

We explore how reinforcement learning can enhance text-to-image diffusion models. We first compare REINFORCE and PPO, and then introduce LOOP—a novel extension of PPO that samples multiple diffusion trajectories per prompt to reduce the PPO gradient variance.

1

2

Shashank Gupta

@shashank27392

4 months

🚀 Very excited to share the latest research from my internship at @AIatMeta, NYC last summer: "A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning".

1

9

Shashank Gupta

@shashank27392

5 months

RT @tomgoldsteincs: New open source reasoning model!. Huginn-3.5B reasons implicitly in latent space 🧠. Unlike O1 and R1, latent reasoning….

0

273

0

Shashank Gupta

@shashank27392

5 months

RT @ZiniuLi: 🚀 Efficient RL Training for LLMs: . ReMax, built on the Verl distributed framework, is now available! 🛠️. 🔑 Key Features:.- Hi….

0

31

0

Shashank Gupta

@shashank27392

5 months

RT @fermatslibrary: Andrew Wiles on being smart

0

448

0

Shashank Gupta

@shashank27392

5 months

RT @abeirami: 𝐛𝐞𝐬𝐭-𝐨𝐟-𝐧 is a strong baseline for .- improving agents.- scaling inference-time compute.- preference alignment .- jailbreakin….

0

55

0

Shashank Gupta

@shashank27392

6 months

RT @NandoDF: Perhaps a better name for the algorithm in deepseek-R1 is: clipped importance sampling, applied to optimisation of a simple on….

0

23

0