Nandan Thakur @beirmug X Profile

Nandan Thakur

@beirmug

Followers

3K

Following

17K

Media

141

Statuses

1K

CS PhD student @uwaterloo • previously intern @DbrxMosaicAI @GoogleAI, RA @UKPLab • IR+NLP research (https://t.co/kxQprYr7Xn, https://t.co/YVvVjSyXOS, TREC-RAG and FreshStack)

https://t.co/DRkk50V4Xw

Waterloo, Ontario

Joined July 2016

Don't wanna be here? Send us removal request.

Nandan Thakur

@beirmug

7 months

Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱

12

35

201

Ai2

@allen_ai

18 hours

Today we’re releasing Deep Research Tulu (DR Tulu)—the first fully open, end-to-end recipe for long-form deep research, plus an 8B agent you can use right away. Train agents that plan, search, synthesize, & cite across sources, making expert research more accessible. 🧭📚

8

82

466

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

3 days

@gowthami_s Having studied at the same UG and Grad institutions as you--if only a trillion years prior--I find this narrative--especially the "no Indian university would've taken a chance <on an @iitmadras graduate>" part a little too drenched in red-white-and-blue syrup..

6

5

145

Nandan Thakur

@beirmug

8 days

Had fun designing the FreshStack #NeurIPS2025 D&B poster! ❤️ FreshStack will be presented in San Diego by @DbrxMosaicAI! ☀️🇺🇸 Thanks to all my co-authors: @lateinteraction @mrdrozdov @sam_havens @mcarbin @lintool!

Nandan Thakur

@beirmug

7 months

Existing IR/RAG benchmarks are unrealistic: they’re often derived from easily retrievable topics, rather than grounded in solving real user problems. 🧵Introducing 𝐅𝐫𝐞𝐬𝐡𝐒𝐭𝐚𝐜𝐤, a challenging RAG benchmark on niche, recent topics. Work done during intern @databricks 🧱

3

5

18

AI4Bharat

@ai4bharat

9 days

🚀 Announcing the Indic LLM-Arena 🇮🇳 At AI4Bharat (IIT Madras), our mission has always been clear - build open, inclusive, and world-class AI for Indian languages. To further this goal, today, we’re introducing the Indic LLM-Arena, a crowd-sourced, human-in-the-loop leaderboard

23

97

557

Nandan Thakur

@beirmug

11 days

Evaluated ModernBERT variants on the FreshStack leaderboard! (i) GTE (ModernBERT) (ii) IBM Granite (and small) english R2 Outperforms Embedding Gemma 300M despite being 149M params. Poster and other updates coming soon!

0

4

7

Nandan Thakur

@beirmug

15 days

I'm unable to attend #EMNLP2025 this time! However, folks attending do look at your data carefully! 🧐 Check out our RLHN poster! We show that pruning & relabeling hard negatives in existing IR training datasets improves OOD generalization!⭐️ Work done with @crystina_z!

Nandan Thakur

@beirmug

6 months

Did you know that fine-tuning retrievers & re-rankers on large but unclean training datasets can harm their performance? 😡 In our new preprint, we re-examine popular IR training data quality by pruning datasets and identifying and relabeling 𝐟𝐚𝐥𝐬𝐞-𝐧𝐞𝐠𝐚𝐭𝐢𝐯𝐞𝐬! 🏷️

0

1

9

Nouamane Tazi

@Nouamanetazi

20 days

We're releasing The Smol Training Playbook 📖 Training SmolLM3 on 384 H100s for nearly a month taught us: infrastructure is the unsung hero of LLM training. Most care about architecture and data, yet few understand the hardware layer. This playbook changes that 🧵

9

20

136

Nandan Thakur

@beirmug

28 days

The @UKPLab handle will be missed. The amount of effort @tomaarsen has put into adding features to the repository is really commendable! Hugely deserved!

tomaarsen

@tomaarsen

28 days

🤗 Sentence Transformers is joining @huggingface! 🤗 This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face. I'm super excited about the transfer! Details in 🧵

0

6

Nandan Thakur

@beirmug

1 month

Happy diwali & kali puja to everyone. 🎇🪔 Fortunate that after a long time celebrating Diwali back at home in Delhi! 😊

2

0

10

Nandan Thakur

@beirmug

1 month

This tells a lot about the rigor & quality of research papers these days. Two recommendations here: 1. Let's not auto generate citations with LLMs. Unreliable! 2. Don't be lazy, do the hard work!! Check carefully each citation (check whether published, URL is present)....

Michael Saxon

@m2saxon

1 month

The viral new "Definition of AGI" paper has fake citations which do not exist. And it specifically TELLS you to read them! Proof: different articles present at the specified journal/volume/page number, and their titles exist nowhere on any searchable repository.

0

3

Jacob Portes

@JacobianNeuro

1 month

Our team at Databricks Research is ramping up internship applications. If you are a PhD student doing research in RL training, multimodal models, information retrieval, evaluation, and coding and data science agents, feel free to DM me!

Matei Zaharia

@matei_zaharia

1 month

My team is hiring AI research interns for summer 2026 at Databricks! Join us to learn about AI use cases at thousands of companies, and contribute to making it easier for anyone to build specialized AI agents and models for difficult tasks.

6

17

172

Nandan Thakur

@beirmug

1 month

accidentally said "retrieval" instead of "RAG" and they kicked me out of sf....

2

0

15

Lily Ge

@lilyjge

1 month

1/5 🎉 Thrilled to share that our paper “QuackIR: Retrieval in DuckDB and Other Relational Database Management Systems” has been accepted to EMNLP 2025 Industry Track! 📄Paper: https://t.co/VdG7kAYp5D 💻 Code:

github.com

QuackIR is an IR toolkit built on DuckDB. Contribute to castorini/quackir development by creating an account on GitHub.

1

4

10

Nandan Thakur

@beirmug

1 month

My mom showed me a cat interacting with a child video on her fb feed and at the end I noticed the SORA logo. Spreading misinformation is so easy among people who are not tech-friendly (even if you add a sora logo at the end).

0

3

Nandan Thakur

@beirmug

1 month

this is a real-world benchmark

Jay A

@jay_azhang

1 month

Our new benchmark has the top 6 AI models trading real capital Grok4 is winning so far. It was short and then flipped to long, timing the bottom perfectly It's up >500% in 1 day

1

0

5

Jina AI

@JinaAI_

2 months

Last but not late: jina-reranker-v3 is here! A new 0.6B-parameter listwise reranker that puts query and all candidate documents in one context window and SOTA on BEIR. We call this new query-document interaction "last but not late" - It's "last" because <|doc_emb|> is placed as

2

17

155

paul

@pteiletche

2 months

Introducing ModernVBERT: a vision-language encoder that matches the performance of models 10× its size on visual document retrieval tasks! 👁️ Read more in the thread👇 (1/N)

7

34

210

Nandan Thakur

@beirmug

2 months

FreshStack is now a part of the RTEB benchmark! 🧱

tomaarsen

@tomaarsen

2 months

We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵

1

6

25

Weaviate Podcast

@weaviatepodcast

2 months

Throwback Thursday! Weaviate Podcast #124 with Nandan Thakur (@beirmug) and Connor Shorten (@CShorten30)! This podcast covers: • The BEIR Benchmarks • Evolution of RAG Benchmarks • Diversity in Search Results • Reasoning and Query Writing • Search Result Summarization •

1

5

6

Nandan Thakur

@beirmug

2 months

This is a good initiative: trying out private splits and hidden test sets in RTEB (MTEB update). More private and robust eval setting. I sincerely hope the community adopts this, afterall this is not an easy 2 click and downloadable dataset.

tomaarsen

@tomaarsen

2 months

We're announcing a new update to MTEB: RTEB It's a new multilingual text embedding retrieval benchmark with private (!) datasets, to ensure that we measure true generalization and avoid (accidental) overfitting. Details in our blogpost below 🧵

2

0

8