
Nandan Thakur
@beirmug
Followers
2K
Following
16K
Media
130
Statuses
1K
PhD @uwaterloo🌲 IR & NLP | I like good evals🔎 Intern @DbrxMosaicAI @GoogleAI | RA @UKPLab | https://t.co/kxQprYr7Xn, https://t.co/KYPd6PIbNL, TREC-RAG and FreshStack! ✨
Edmonton, Alberta
Joined July 2016
Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱
11
34
196
RT @xueguang_ma: ScholarCopilot (led by @YuboWang726) is now accepted at COLM 2025!. Lots of great work has emerged over the past half year….
0
2
0
Lol I posted this ~ month ago. Probably wasn't the best place at the time. Can't believe Meta has turned the table all so sudden within a month or two.
@jxmnop Why not Meta? I see you gel well with them and have great projects plus you already would know the ecosystem.
0
0
5
Glad to have been a part of this anti "RAG is dead" movement. I focused on Modern IR evaluations and how the traditional metrics need to change for evaluating search/retrieval in the RAG Era! ⚡⚡. If you missed, check these amazing annotated slides:.
hamel.dev
Nandan Thakur on why traditional IR evals are insufficient for RAG and how new benchmarks like FreshStack provide a better path forward.
In our rage against "RAG is Dead", @bclavie and I assembled annotated talk notes (Simon Williston style). We take you through the many reasons why RAG is just getting started: single dense vector representations are quite naive. Notes in reply.
0
3
34
RT @HamelHusain: Overview of the series. 1. We’ve been measuring wrong. @beirmug showed that traditional IR metrics optimize for finding th….
0
4
0
RT @lateinteraction: Amazing line-up on retrieval, covering FreshStack (#1 @beirmug) and Late Interaction (#3 @antoine_chaffin), among othe….
0
7
0
Unfortunately I will be missing out on attending SIGIR this year due to visa issues. However our team will be heading to #SIGIR2025. Please check out the list below to check out our multiple works on TREC RAG being presented this year! . Support, Nuggets & Retrieval!.
Thrilled to be heading to #SIGIR2025 with 𝗥𝗮𝗻𝗸𝗟𝗟𝗠! I’ll also be stepping in for a few other exciting projects from the #UniversityOfWaterloo team. If you’ll be there, let’s chat! 👋.📌 𝗕𝗼𝗼𝗸𝗺𝗮𝗿𝗸 𝗼𝗳 𝗮𝗹𝗹 𝗨𝗪 𝗽𝗮𝗽𝗲𝗿𝘀 available in the images below:
0
0
6
RT @HamelHusain: I created an annotation version of @beirmug 's presentation on IR Evals for RAG. Nandan argues that we should consider add….
hamel.dev
Nandan Thakur on why traditional IR evals are insufficient for RAG and how new benchmarks like FreshStack provide a better path forward.
0
8
0
RT @vishal_learner: This is so freaking cool. IIUC my AgentFastbook project goals seem very similar to FreshStack, which is extremely valid….
0
1
0
RT @atitaarora: Solid ! Great talk @beirmug and thanks @HamelHusain for 'Tech talk as email' drive ! Lovin' it !
0
2
0
If you missed my talk on IR evaluations for RAG you can find the annotated notes + slides for the presentation below!.
hamel.dev
Nandan Thakur on why traditional IR evals are insufficient for RAG and how new benchmarks like FreshStack provide a better path forward.
I created an annotation version of @beirmug 's presentation on IR Evals for RAG. Nandan argues that we should consider additional retrieval metrics beyond the classics (MRR, etc) b/c retrieval goals for RAG can sometimes be very different.
0
4
21
RT @antoine_chaffin: I'll be covering Reason-ModernColBERT in tonight's presentation, so please come if you are interested!..
maven.com
Single vector search is the standard for RAG pipelines, but struggles in real-world applications due to poor out-of-domain generalization and long-context handling. Multi-vector models overcome these...
0
14
0
Thank you for attending the talk! I'm a huge believer that the next gen of IR metrics should be diversity-focused and evaluating relevant, diverse, informative and correct sources. Here is a tl;dr slide from my talk yesterday! 🍻
Great presentation from @beirmug yesterday about RAG evals. FreshStack looks like the exactly what the next evolution of IR evals should be. Great stuff.
0
3
25
RT @HamelHusain: Come troll us along with 4k other students about RAG evals. Nandan has been working hard on creating a banger. If yo….
0
2
0
This is a useful blogpost! 🤗. Please read it and I'll talk more about this in my IR evals presentation today on how we should stop using stale benchmarks for modern day IR or QA evaluation. Talk starts in less than 2 hours! ⌛.
maven.com
Traditional IR benchmarks fall short for real-world RAG applications due to stale data, incomplete labels, and unrealistic queries. This talk introduces FreshStack, a new benchmark built from recent...
0
1
8