beirmug Profile Banner
Nandan Thakur Profile
Nandan Thakur

@beirmug

Followers
2K
Following
16K
Media
130
Statuses
1K

PhD @uwaterloo🌲 IR & NLP | I like good evals🔎 Intern @DbrxMosaicAI @GoogleAI | RA @UKPLab | https://t.co/kxQprYr7Xn, https://t.co/KYPd6PIbNL, TREC-RAG and FreshStack! ✨

Edmonton, Alberta
Joined July 2016
Don't wanna be here? Send us removal request.
@beirmug
Nandan Thakur
3 months
Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱
11
34
196
@beirmug
Nandan Thakur
1 day
They ask me what gives you pain.
Tweet media one
0
0
5
@beirmug
Nandan Thakur
2 days
RT @din0s_: flight prep 🧳
Tweet media one
0
3
0
@beirmug
Nandan Thakur
5 days
RT @xueguang_ma: ScholarCopilot (led by @YuboWang726) is now accepted at COLM 2025!. Lots of great work has emerged over the past half year….
0
2
0
@beirmug
Nandan Thakur
7 days
Lol I posted this ~ month ago. Probably wasn't the best place at the time. Can't believe Meta has turned the table all so sudden within a month or two.
@beirmug
Nandan Thakur
2 months
@jxmnop Why not Meta? I see you gel well with them and have great projects plus you already would know the ecosystem.
0
0
5
@beirmug
Nandan Thakur
9 days
Glad to have been a part of this anti "RAG is dead" movement. I focused on Modern IR evaluations and how the traditional metrics need to change for evaluating search/retrieval in the RAG Era! ⚡⚡. If you missed, check these amazing annotated slides:.
Tweet card summary image
hamel.dev
Nandan Thakur on why traditional IR evals are insufficient for RAG and how new benchmarks like FreshStack provide a better path forward.
@HamelHusain
Hamel Husain
11 days
In our rage against "RAG is Dead", @bclavie and I assembled annotated talk notes (Simon Williston style). We take you through the many reasons why RAG is just getting started: single dense vector representations are quite naive. Notes in reply.
0
3
34
@beirmug
Nandan Thakur
10 days
RT @sh_reya: I learned a lot about RAG from this mini series. I loved the way it was organized & the breadth of topics covered—contrast to….
0
3
0
@beirmug
Nandan Thakur
10 days
TREC RAG TOPICS ARE OFFICIALLY OUT NOW!!! 🔥🔥🔥. LET THE GAMES BEGIN! 🥶.
@Ushivani3
Shivani Upadhyay
13 days
📢📢RAG 2025 topics are officially now released!. 🔍Test narratives are out now (total 105): Let the games begin!.#TREC2025 #RAG.
0
5
12
@beirmug
Nandan Thakur
11 days
RT @HamelHusain: Overview of the series. 1. We’ve been measuring wrong. @beirmug showed that traditional IR metrics optimize for finding th….
0
4
0
@beirmug
Nandan Thakur
11 days
RT @lateinteraction: Amazing line-up on retrieval, covering FreshStack (#1 @beirmug) and Late Interaction (#3 @antoine_chaffin), among othe….
0
7
0
@beirmug
Nandan Thakur
13 days
Unfortunately I will be missing out on attending SIGIR this year due to visa issues. However our team will be heading to #SIGIR2025. Please check out the list below to check out our multiple works on TREC RAG being presented this year! . Support, Nuggets & Retrieval!.
@Sahel_Sharify
Sahel Sharifymoghaddam
16 days
Thrilled to be heading to #SIGIR2025 with 𝗥𝗮𝗻𝗸𝗟𝗟𝗠! I’ll also be stepping in for a few other exciting projects from the #UniversityOfWaterloo team. If you’ll be there, let’s chat! 👋.📌 𝗕𝗼𝗼𝗸𝗺𝗮𝗿𝗸 𝗼𝗳 𝗮𝗹𝗹 𝗨𝗪 𝗽𝗮𝗽𝗲𝗿𝘀 available in the images below:
Tweet media one
Tweet media two
0
0
6
@beirmug
Nandan Thakur
13 days
RT @lintool: It’s been 36 hours since Grok 4 launched and we have an early verdict based on 6K+ preferences of @yupp_ai users globally on r….
0
179
0
@beirmug
Nandan Thakur
14 days
RT @HamelHusain: I created an annotation version of @beirmug 's presentation on IR Evals for RAG. Nandan argues that we should consider add….
Tweet card summary image
hamel.dev
Nandan Thakur on why traditional IR evals are insufficient for RAG and how new benchmarks like FreshStack provide a better path forward.
0
8
0
@beirmug
Nandan Thakur
16 days
RT @vishal_learner: This is so freaking cool. IIUC my AgentFastbook project goals seem very similar to FreshStack, which is extremely valid….
0
1
0
@beirmug
Nandan Thakur
16 days
RT @jobergum: Extremely useful annotation! Thanks for sharing @HamelHusain and @beirmug 🍻.
0
2
0
@beirmug
Nandan Thakur
16 days
RT @atitaarora: Solid ! Great talk @beirmug and thanks @HamelHusain for 'Tech talk as email' drive ! Lovin' it !
Tweet media one
0
2
0
@beirmug
Nandan Thakur
16 days
If you missed my talk on IR evaluations for RAG you can find the annotated notes + slides for the presentation below!.
Tweet card summary image
hamel.dev
Nandan Thakur on why traditional IR evals are insufficient for RAG and how new benchmarks like FreshStack provide a better path forward.
@HamelHusain
Hamel Husain
17 days
I created an annotation version of @beirmug 's presentation on IR Evals for RAG. Nandan argues that we should consider additional retrieval metrics beyond the classics (MRR, etc) b/c retrieval goals for RAG can sometimes be very different.
0
4
21
@beirmug
Nandan Thakur
21 days
Thank you for attending the talk! I'm a huge believer that the next gen of IR metrics should be diversity-focused and evaluating relevant, diverse, informative and correct sources. Here is a tl;dr slide from my talk yesterday! 🍻
Tweet media one
@AndreiOnel
Andrei Onel
21 days
Great presentation from @beirmug yesterday about RAG evals. FreshStack looks like the exactly what the next evolution of IR evals should be. Great stuff.
0
3
25
@beirmug
Nandan Thakur
22 days
RT @HamelHusain: Come troll us along with 4k other students about RAG evals. Nandan has been working hard on creating a banger. If yo….
0
2
0
@beirmug
Nandan Thakur
22 days
This is a useful blogpost! 🤗. Please read it and I'll talk more about this in my IR evals presentation today on how we should stop using stale benchmarks for modern day IR or QA evaluation. Talk starts in less than 2 hours! ⌛.
Tweet card summary image
maven.com
Traditional IR benchmarks fall short for real-world RAG applications due to stale data, incomplete labels, and unrealistic queries. This talk introduces FreshStack, a new benchmark built from recent...
@qi2peng2
Peng Qi
22 days
Seven years ago, I co-led a paper called 𝗛𝗼𝘁𝗽𝗼𝘁𝗤𝗔 that has motivated and facilitated many #AI #Agents research works since. Today, I'm asking that you stop using HotpotQA blindly for agents research in 2025 and beyond. In my new blog post, I revisit the brief history of.
0
1
8