beirmug Profile Banner
Nandan Thakur Profile
Nandan Thakur

@beirmug

Followers
3K
Following
17K
Media
141
Statuses
1K

CS PhD student @uwaterloo • previously intern @DbrxMosaicAI @GoogleAI, RA @UKPLab • IR+NLP research (https://t.co/kxQprYr7Xn, https://t.co/YVvVjSyXOS, TREC-RAG and FreshStack)

Waterloo, Ontario
Joined July 2016
Don't wanna be here? Send us removal request.
@beirmug
Nandan Thakur
7 months
Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱
12
35
201
@allen_ai
Ai2
18 hours
Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚
8
82
466
@rao2z
Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)
3 days
@gowthami_s Having studied at the same UG and Grad institutions as you--if only a trillion years prior--I find this narrative--especially the "no Indian university would've taken a chance <on an @iitmadras graduate>" part a little too drenched in red-white-and-blue syrup..
6
5
145
@beirmug
Nandan Thakur
8 days
Had fun designing the FreshStack #NeurIPS2025 D&B poster! ❤️ FreshStack will be presented in San Diego by @DbrxMosaicAI! ☀️🇺🇸 Thanks to all my co-authors: @lateinteraction @mrdrozdov @sam_havens @mcarbin @lintool!
@beirmug
Nandan Thakur
7 months
Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱
3
5
18
@ai4bharat
AI4Bharat
9 days
🚀 Announcing the Indic LLM-Arena 🇮🇳 At AI4Bharat (IIT Madras), our mission has always been clear - build open, inclusive, and world-class AI for Indian languages. To further this goal, today, we’re introducing the Indic LLM-Arena, a crowd-sourced, human-in-the-loop leaderboard
23
97
557
@beirmug
Nandan Thakur
11 days
Evaluated ModernBERT variants on the FreshStack leaderboard! (i) GTE (ModernBERT) (ii) IBM Granite (and small) english R2 Outperforms Embedding Gemma 300M despite being 149M params. Poster and other updates coming soon!
0
4
7
@beirmug
Nandan Thakur
15 days
I'm unable to attend #EMNLP2025 this time! However, folks attending do look at your data carefully! 🧐 Check out our RLHN poster! We show that pruning & relabeling hard negatives in existing IR training datasets improves OOD generalization!⭐️ Work done with @crystina_z!
@beirmug
Nandan Thakur
6 months
Did you know that fine-tuning retrievers & re-rankers on large but unclean training datasets can harm their performance? 😡 In our new preprint, we re-examine popular IR training data quality by pruning datasets and identifying and relabeling 𝐟𝐚𝐥𝐬𝐞-𝐧𝐞𝐠𝐚𝐭𝐢𝐯𝐞𝐬! 🏷️
0
1
9
@Nouamanetazi
Nouamane Tazi
20 days
We're releasing The Smol Training Playbook 📖 Training SmolLM3 on 384 H100s for nearly a month taught us: infrastructure is the unsung hero of LLM training. Most care about architecture and data, yet few understand the hardware layer. This playbook changes that 🧵
9
20
136
@beirmug
Nandan Thakur
28 days
The @UKPLab handle will be missed. The amount of effort @tomaarsen has put into adding features to the repository is really commendable! Hugely deserved!
@tomaarsen
tomaarsen
28 days
🤗 Sentence Transformers is joining @huggingface! 🤗 This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face. I'm super excited about the transfer! Details in 🧵
0
0
6
@beirmug
Nandan Thakur
1 month
Happy diwali & kali puja to everyone. 🎇🪔 Fortunate that after a long time celebrating Diwali back at home in Delhi! 😊
2
0
10
@beirmug
Nandan Thakur
1 month
This tells a lot about the rigor & quality of research papers these days. Two recommendations here: 1. Let's not auto generate citations with LLMs. Unreliable! 2. Don't be lazy, do the hard work!! Check carefully each citation (check whether published, URL is present)....
@m2saxon
Michael Saxon
1 month
The viral new "Definition of AGI" paper has fake citations which do not exist. And it specifically TELLS you to read them! Proof: different articles present at the specified journal/volume/page number, and their titles exist nowhere on any searchable repository.
0
0
3
@JacobianNeuro
Jacob Portes
1 month
Our team at Databricks Research is ramping up internship applications. If you are a PhD student doing research in RL training, multimodal models, information retrieval, evaluation, and coding and data science agents, feel free to DM me!
@matei_zaharia
Matei Zaharia
1 month
My team is hiring AI research interns for summer 2026 at Databricks! Join us to learn about AI use cases at thousands of companies, and contribute to making it easier for anyone to build specialized AI agents and models for difficult tasks.
6
17
172
@beirmug
Nandan Thakur
1 month
accidentally said "retrieval" instead of "RAG" and they kicked me out of sf....
2
0
15
@lilyjge
Lily Ge
1 month
1/5 🎉 Thrilled to share that our paper “QuackIR: Retrieval in DuckDB and Other Relational Database Management Systems” has been accepted to EMNLP 2025 Industry Track! 📄Paper: https://t.co/VdG7kAYp5D 💻 Code:
Tweet card summary image
github.com
QuackIR is an IR toolkit built on DuckDB. Contribute to castorini/quackir development by creating an account on GitHub.
1
4
10
@beirmug
Nandan Thakur
1 month
My mom showed me a cat interacting with a child video on her fb feed and at the end I noticed the SORA logo. Spreading misinformation is so easy among people who are not tech-friendly (even if you add a sora logo at the end).
0
0
3
@beirmug
Nandan Thakur
1 month
this is a real-world benchmark
@jay_azhang
Jay A
1 month
Our new benchmark has the top 6 AI models trading real capital Grok4 is winning so far. It was short and then flipped to long, timing the bottom perfectly It's up >500% in 1 day
1
0
5
@JinaAI_
Jina AI
2 months
Last but not late: jina-reranker-v3 is here! A new 0.6B-parameter listwise reranker that puts query and all candidate documents in one context window and SOTA on BEIR. We call this new query-document interaction "last but not late" - It's "last" because <|doc_emb|> is placed as
2
17
155
@pteiletche
paul
2 months
Introducing ModernVBERT: a vision-language encoder that matches the performance of models 10× its size on visual document retrieval tasks! 👁️ Read more in the thread👇 (1/N)
7
34
210
@beirmug
Nandan Thakur
2 months
FreshStack is now a part of the RTEB benchmark! 🧱
@tomaarsen
tomaarsen
2 months
We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵
1
6
25
@weaviatepodcast
Weaviate Podcast
2 months
Throwback Thursday! Weaviate Podcast #124 with Nandan Thakur (@beirmug) and Connor Shorten (@CShorten30)! This podcast covers: • The BEIR Benchmarks • Evolution of RAG Benchmarks • Diversity in Search Results • Reasoning and Query Writing • Search Result Summarization •
1
5
6
@beirmug
Nandan Thakur
2 months
This is a good initiative: trying out private splits and hidden test sets in RTEB (MTEB update). More private and robust eval setting. I sincerely hope the community adopts this, afterall this is not an easy 2 click and downloadable dataset.
@tomaarsen
tomaarsen
2 months
We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵
2
0
8