shashank27392 Profile Banner
Shashank Gupta Profile
Shashank Gupta

@shashank27392

Followers
1K
Following
51K
Media
98
Statuses
4K

PhD at @irlab_amsterdam | Prev. @AIatMeta (NYC '24, London '23), @Flipkart | Interested in ML & IR.

New York
Joined March 2015
Don't wanna be here? Send us removal request.
@shashank27392
Shashank Gupta
1 year
Interested in learning about the latest advancements in the area of Counterfactual Learning to Rank? Philipp Hager and I will present a tutorial on the topic.@WSDMSocial.next Monday @WSDMSocial Joint work with @_Jin_Huang_ @AliVardasbi @HarrieOos.
0
2
14
@shashank27392
Shashank Gupta
18 days
RT @CharlieLondon02: I believe that policy gradient methods with only terminal rewards will have to break down at some level of task ood-ne….
0
1
0
@shashank27392
Shashank Gupta
1 month
RT @y0b1byte: another good one!
Tweet media one
0
43
0
@shashank27392
Shashank Gupta
1 month
RT @tianyuanzhang99: Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative?. Our new paper….
0
75
0
@shashank27392
Shashank Gupta
1 month
RT @SimonShaoleiDu: PPO vs. DPO? 🤔.Our new paper proves that it depends on whether your models can represent the optimal policy and/or rewa….
0
18
0
@shashank27392
Shashank Gupta
2 months
RT @abeirami: As we go through a lot of excitement about RL recently with lots of cool work/results, here is a reminder that RL with a reve….
0
52
0
@shashank27392
Shashank Gupta
2 months
RT @sirbayes: I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor upd….
0
452
0
@shashank27392
Shashank Gupta
2 months
RT @probnstat: Lectures on Unbiased Estimation by Lester Mackey.
Tweet media one
0
40
0
@shashank27392
Shashank Gupta
2 months
RT @goyal__pramod: Gpt-2 is just 174 lines of code. How crazy is that
Tweet media one
0
195
0
@shashank27392
Shashank Gupta
2 months
RT @Ji_Ha_Kim: I got recommended Terence Tao's YouTube channel created in 2010, where he uploaded his first video just yesterday!.He showca….
0
46
0
@shashank27392
Shashank Gupta
3 months
RT @bremen79: Found this paper in the RL slides of @alexolshevsky1:. Most policy gradient methods drop the discount….
0
31
0
@shashank27392
Shashank Gupta
3 months
RT @qingfeng_lan: 🚀RL algorithms are shaping the post-training of LLMs, but how do their objectives connect? In this blog, I explore their….
0
54
0
@shashank27392
Shashank Gupta
4 months
RT @leloykun: I'm not sure if someone has already pointed this out, but Dr. GRPO still has a bias that is more pronounced the smaller the g….
0
63
0
@shashank27392
Shashank Gupta
4 months
This simple yet effective approach leads improved fine-tuning of diffusion models on various downstream tasks. Preprint:
0
0
1
@shashank27392
Shashank Gupta
4 months
We explore how reinforcement learning can enhance text-to-image diffusion models. We first compare REINFORCE and PPO, and then introduce LOOP—a novel extension of PPO that samples multiple diffusion trajectories per prompt to reduce the PPO gradient variance.
1
1
2
@shashank27392
Shashank Gupta
4 months
🚀 Very excited to share the latest research from my internship at @AIatMeta, NYC last summer: "A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning".
Tweet media one
1
1
9
@shashank27392
Shashank Gupta
5 months
RT @tomgoldsteincs: New open source reasoning model!. Huginn-3.5B reasons implicitly in latent space 🧠. Unlike O1 and R1, latent reasoning….
0
273
0
@shashank27392
Shashank Gupta
5 months
RT @ZiniuLi: 🚀 Efficient RL Training for LLMs: . ReMax, built on the Verl distributed framework, is now available! 🛠️. 🔑 Key Features:.- Hi….
0
31
0
@shashank27392
Shashank Gupta
5 months
RT @fermatslibrary: Andrew Wiles on being smart
Tweet media one
0
448
0
@shashank27392
Shashank Gupta
5 months
RT @abeirami: 𝐛𝐞𝐬𝐭-𝐨𝐟-𝐧 is a strong baseline for .- improving agents.- scaling inference-time compute.- preference alignment .- jailbreakin….
0
55
0
@shashank27392
Shashank Gupta
6 months
RT @NandoDF: Perhaps a better name for the algorithm in deepseek-R1 is: clipped importance sampling, applied to optimisation of a simple on….
0
23
0