
Shashank Gupta
@shashank27392
Followers
1K
Following
51K
Media
98
Statuses
4K
PhD at @irlab_amsterdam | Prev. @AIatMeta (NYC '24, London '23), @Flipkart | Interested in ML & IR.
New York
Joined March 2015
Interested in learning about the latest advancements in the area of Counterfactual Learning to Rank? Philipp Hager and I will present a tutorial on the topic.@WSDMSocial.next Monday @WSDMSocial Joint work with @_Jin_Huang_ @AliVardasbi @HarrieOos.
0
2
14
RT @CharlieLondon02: I believe that policy gradient methods with only terminal rewards will have to break down at some level of task ood-ne….
0
1
0
RT @tianyuanzhang99: Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative?. Our new paper….
0
75
0
RT @SimonShaoleiDu: PPO vs. DPO? 🤔.Our new paper proves that it depends on whether your models can represent the optimal policy and/or rewa….
0
18
0
RT @abeirami: As we go through a lot of excitement about RL recently with lots of cool work/results, here is a reminder that RL with a reve….
0
52
0
RT @sirbayes: I am pleased to announce a new version of my RL tutorial. Major update to the LLM chapter (eg DPO, GRPO, thinking), minor upd….
0
452
0
RT @Ji_Ha_Kim: I got recommended Terence Tao's YouTube channel created in 2010, where he uploaded his first video just yesterday!.He showca….
0
46
0
RT @bremen79: Found this paper in the RL slides of @alexolshevsky1:. Most policy gradient methods drop the discount….
0
31
0
RT @qingfeng_lan: 🚀RL algorithms are shaping the post-training of LLMs, but how do their objectives connect? In this blog, I explore their….
0
54
0
RT @leloykun: I'm not sure if someone has already pointed this out, but Dr. GRPO still has a bias that is more pronounced the smaller the g….
0
63
0
🚀 Very excited to share the latest research from my internship at @AIatMeta, NYC last summer: "A Simple and Effective Reinforcement Learning Method for Text-to-Image Diffusion Fine-tuning".
1
1
9
RT @tomgoldsteincs: New open source reasoning model!. Huginn-3.5B reasons implicitly in latent space 🧠. Unlike O1 and R1, latent reasoning….
0
273
0
RT @ZiniuLi: 🚀 Efficient RL Training for LLMs: . ReMax, built on the Verl distributed framework, is now available! 🛠️. 🔑 Key Features:.- Hi….
0
31
0
RT @abeirami: 𝐛𝐞𝐬𝐭-𝐨𝐟-𝐧 is a strong baseline for .- improving agents.- scaling inference-time compute.- preference alignment .- jailbreakin….
0
55
0
RT @NandoDF: Perhaps a better name for the algorithm in deepseek-R1 is: clipped importance sampling, applied to optimisation of a simple on….
0
23
0