
Swarnadeep Saha
@swarnaNLP
Followers
1K
Following
1K
Media
57
Statuses
622
Research Scientist @AIatMeta (FAIR) working on Reasoning. Past: @Google PhD fellow @uncnlp. Gooner.
Seattle, Washington
Joined May 2014
Progress of AI is bottlenecked by the quality of evaluation, motivating the need for powerful and generalist LLM judges that can think and reason. Here's our latest paper, J1, on how to train such Thinking-LLM-Judges with RL. 🧵👇.
🚨 New paper 🚨.J1: Incentivizing Thinking in LLM-as-a-Judge via RL. - Converts judgement task into a verifiable one for both verifiable and non-verifiable prompts. Uses only synthetic pairwise data. - Optimizes thoughts, scores, and judgments using GRPO. - Outperforms all
2
3
58
Check out our new paper where we compared offline and (Semi-)Online DPO with GRPO for post-training LLMs. This led to some interesting findings! 👇.
🌉 Bridging Offline & Online RL for LLMs 🌉.📝: New paper shows on verifiable & non-verifiable tasks:.- Online DPO & GRPO give similar performance. - Semi-online (iterative) DPO with sync every s steps (more efficient!) works very well also. - Offline DPO
1
1
8
RT @dair_ai: 3. J1. Introduces a novel training approach for LLMs to act as evaluators (LLM-as-a-Judge) by explicitly incentivizing thought….
0
1
0
RT @johnschulman2: For people who don't like Claude's behavior here (and I think it's totally valid to disagree with it), I encourage you t….
0
40
0
RT @rohanpaul_ai: Evaluation of LLMs is difficult due to judge models using limited reasoning and suffering from biases. This paper propos….
0
1
0
RT @TheTuringPost: The freshest research of the week:. Our top 9:.▪️ Beyond 'Aha!'.▪️ J1: Incentivizing Thinking in LLM-as-a-Judge via Rein….
0
11
0
We're organizing the RAM 2 workshop at COLM 2025 (10 years after the first edition in NeurIPS 2015). Check out our Call of Papers on topics in Reasoning, Attention, and Memory.
🚨Announcing RAM 2 workshop @ COLM25 - call for papers🚨 .- 10 years on, we present the sequel to the classic RAM🐏 (Reasoning, Attention, Memory) workshop that took place in 2015 at the cusp of major change in the area. Now in 2025 we reflect on what's happened and discuss the
0
0
5
RT @chenxi_jw: Presenting new work: Thinking LLM-as-a-Judge via RL!. It’s been great fun working with @swarnaNLP, @jaseweston, @uralik1 and….
0
1
0
RT @NathanThinks: excellent work by @jaseweston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoni….
0
11
0
Excited to share that EvalPlanner is accepted to #ICML2025! . To make meaningful progress in AI, we need strong evaluators, and specifically those that can reason. Stay tuned for more updates, as we continue to make progress in this space! 😀.
💭🔎 Introducing EvalPlanner – a method to train a Thinking-LLM-as-a-Judge that learns to generate planning & reasoning CoTs for evaluation. Strong performance on RewardBench, RM-Bench, JudgeBench & FollowBenchEval. Paper 📄:
5
11
82
RT @SomnathBrc: 𝐇𝐨𝐰 𝐜𝐚𝐧 𝐰𝐞 𝐩𝐞𝐫𝐟𝐞𝐜𝐭𝐥𝐲 𝐞𝐫𝐚𝐬𝐞 𝐜𝐨𝐧𝐜𝐞𝐩𝐭𝐬 𝐟𝐫𝐨𝐦 𝐋𝐋𝐌𝐬?. Our method, Perfect Erasure Functions (PEF), erases concepts from LLM repre….
0
35
0
RT @tesatory: Ten years ago in 2015 we published a paper called End-to-End Memory Networks (. Looking back, this pa….
0
117
0
RT @jaseweston: 🚨Multi-Token Attention🚨.📝: Attention is critical for LLMs, but its weights are computed by single….
0
148
0
RT @ArchikiPrasad: 🚨 Excited to share: "Learning to Generate Unit Tests for Automated Debugging" 🚨.which introduces ✨UTGen and UTDebug✨ for….
0
61
0
Complex evaluation is in parts planning and in parts reasoning. Hence, we trained an LLM to think before producing a judgment. My first work since joining this awesome team 😄.
💭🔎 Introducing EvalPlanner – a method to train a Thinking-LLM-as-a-Judge that learns to generate planning & reasoning CoTs for evaluation. Strong performance on RewardBench, RM-Bench, JudgeBench & FollowBenchEval. Paper 📄:
4
15
75
RT @mohitban47: 🎉 Congrats to the awesome students, postdocs, & collaborators for this exciting batch of #ICLR2025 and #NAACL2025 accepted….
0
31
0