Nandan Thakur
@beirmug
Followers
3K
Following
17K
Media
141
Statuses
1K
CS PhD student @uwaterloo • previously intern @DbrxMosaicAI @GoogleAI, RA @UKPLab • IR+NLP research (https://t.co/kxQprYr7Xn, https://t.co/YVvVjSyXOS, TREC-RAG and FreshStack)
Waterloo, Ontario
Joined July 2016
Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱
12
35
201
Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚
8
82
466
@gowthami_s Having studied at the same UG and Grad institutions as you--if only a trillion years prior--I find this narrative--especially the "no Indian university would've taken a chance <on an @iitmadras graduate>" part a little too drenched in red-white-and-blue syrup..
6
5
145
Had fun designing the FreshStack #NeurIPS2025 D&B poster! ❤️ FreshStack will be presented in San Diego by @DbrxMosaicAI! ☀️🇺🇸 Thanks to all my co-authors: @lateinteraction @mrdrozdov @sam_havens @mcarbin @lintool!
Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱
3
5
18
🚀 Announcing the Indic LLM-Arena 🇮🇳 At AI4Bharat (IIT Madras), our mission has always been clear - build open, inclusive, and world-class AI for Indian languages. To further this goal, today, we’re introducing the Indic LLM-Arena, a crowd-sourced, human-in-the-loop leaderboard
23
97
557
Evaluated ModernBERT variants on the FreshStack leaderboard! (i) GTE (ModernBERT) (ii) IBM Granite (and small) english R2 Outperforms Embedding Gemma 300M despite being 149M params. Poster and other updates coming soon!
0
4
7
I'm unable to attend #EMNLP2025 this time! However, folks attending do look at your data carefully! 🧐 Check out our RLHN poster! We show that pruning & relabeling hard negatives in existing IR training datasets improves OOD generalization!⭐️ Work done with @crystina_z!
Did you know that fine-tuning retrievers & re-rankers on large but unclean training datasets can harm their performance? 😡 In our new preprint, we re-examine popular IR training data quality by pruning datasets and identifying and relabeling 𝐟𝐚𝐥𝐬𝐞-𝐧𝐞𝐠𝐚𝐭𝐢𝐯𝐞𝐬! 🏷️
0
1
9
We're releasing The Smol Training Playbook 📖 Training SmolLM3 on 384 H100s for nearly a month taught us: infrastructure is the unsung hero of LLM training. Most care about architecture and data, yet few understand the hardware layer. This playbook changes that 🧵
9
20
136
The @UKPLab handle will be missed. The amount of effort @tomaarsen has put into adding features to the repository is really commendable! Hugely deserved!
🤗 Sentence Transformers is joining @huggingface! 🤗 This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face. I'm super excited about the transfer! Details in 🧵
0
0
6
Happy diwali & kali puja to everyone. 🎇🪔 Fortunate that after a long time celebrating Diwali back at home in Delhi! 😊
2
0
10
This tells a lot about the rigor & quality of research papers these days. Two recommendations here: 1. Let's not auto generate citations with LLMs. Unreliable! 2. Don't be lazy, do the hard work!! Check carefully each citation (check whether published, URL is present)....
The viral new "Definition of AGI" paper has fake citations which do not exist. And it specifically TELLS you to read them! Proof: different articles present at the specified journal/volume/page number, and their titles exist nowhere on any searchable repository.
0
0
3
Our team at Databricks Research is ramping up internship applications. If you are a PhD student doing research in RL training, multimodal models, information retrieval, evaluation, and coding and data science agents, feel free to DM me!
My team is hiring AI research interns for summer 2026 at Databricks! Join us to learn about AI use cases at thousands of companies, and contribute to making it easier for anyone to build specialized AI agents and models for difficult tasks.
6
17
172
accidentally said "retrieval" instead of "RAG" and they kicked me out of sf....
2
0
15
1/5 🎉 Thrilled to share that our paper “QuackIR: Retrieval in DuckDB and Other Relational Database Management Systems” has been accepted to EMNLP 2025 Industry Track! 📄Paper: https://t.co/VdG7kAYp5D 💻 Code:
github.com
QuackIR is an IR toolkit built on DuckDB. Contribute to castorini/quackir development by creating an account on GitHub.
1
4
10
My mom showed me a cat interacting with a child video on her fb feed and at the end I noticed the SORA logo. Spreading misinformation is so easy among people who are not tech-friendly (even if you add a sora logo at the end).
0
0
3
Last but not late: jina-reranker-v3 is here! A new 0.6B-parameter listwise reranker that puts query and all candidate documents in one context window and SOTA on BEIR. We call this new query-document interaction "last but not late" - It's "last" because <|doc_emb|> is placed as
2
17
155
Introducing ModernVBERT: a vision-language encoder that matches the performance of models 10× its size on visual document retrieval tasks! 👁️ Read more in the thread👇 (1/N)
7
34
210
FreshStack is now a part of the RTEB benchmark! 🧱
We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵
1
6
25
Throwback Thursday! Weaviate Podcast #124 with Nandan Thakur (@beirmug) and Connor Shorten (@CShorten30)! This podcast covers: • The BEIR Benchmarks • Evolution of RAG Benchmarks • Diversity in Search Results • Reasoning and Query Writing • Search Result Summarization •
1
5
6
This is a good initiative: trying out private splits and hidden test sets in RTEB (MTEB update). More private and robust eval setting. I sincerely hope the community adopts this, afterall this is not an easy 2 click and downloadable dataset.
We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵
2
0
8