Manveer Singh Tamber @ManveerTamber X Profile

Manveer Singh Tamber

@ManveerTamber

Followers

34

Following

47

Media

3

Statuses

21

CS PhD Student @UWaterloo

Joined September 2023

Don't wanna be here? Send us removal request.

Manveer Singh Tamber

@ManveerTamber

7 days

Paper link:🔗 https://t.co/cvemrBS94r

aclanthology.org

Manveer Singh Tamber, Forrest Sheng Bao, Chenyu Xu, Ge Luo, Suleman Kazi, Minseok Bae, Miaoran Li, Ofer Mendelevitch, Renyi Qu, Jimmy Lin. Proceedings of the 2025 Conference on Empirical Methods in...

0

1

Manveer Singh Tamber

@ManveerTamber

7 days

Our paper with @vectara, “Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards”, is now published in the EMNLP 2025 Industry Track! Check out our work on enabling more reliable LLM faithfulness benchmarking in RAG!

Manveer Singh Tamber

@ManveerTamber

6 months

Introducing 🔍 FaithJudge: improving how we evaluate LLM faithfulness in RAG tasks, including summarization, QA, and data-to-text and powering a more accurate LLM Hallucination Leaderboard. 🔗

1

3

6

Manveer Singh Tamber

@ManveerTamber

6 months

As LLMs evolve, our benchmarks must too. We’ll continue updating FaithJudge (currently using an o3-mini-high judge) and the leaderboard for more complete evaluation. 📄 Full paper:

arxiv.org

Retrieval-augmented generation (RAG) aims to reduce hallucinations by grounding responses in external context, yet large language models (LLMs) still frequently introduce unsupported information...

0

1

Manveer Singh Tamber

@ManveerTamber

6 months

FaithJudge uses hallucination annotation examples to guide the automated evaluation of LLM responses for the same tasks. ✅ Outperforms prior methods 🤝 Matches human judgment more closely 📊 Works across summarization, QA, and data-to-text

1

0

Manveer Singh Tamber

@ManveerTamber

6 months

We introduced FaithBench to evaluate hallucination detectors on challenging summaries from 10 modern LLMs. FaithBench highlights just how hard hallucination detection still is. 🔗

aclanthology.org

Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelev...

1

0

Manveer Singh Tamber

@ManveerTamber

6 months

Existing hallucination detection methods (fine-tuned models or zero-shot LLM judges) are promising, but often struggle. FaithJudge improves detection by prompting LLM judges with human-labelled hallucination examples from FaithBench and RagTruth.

1

0

Manveer Singh Tamber

@ManveerTamber

6 months

LLM hallucinations remain stubbornly persistent, even with RAG. LLMs still ➕ add, 🔄 distort, or ❌ contradict contexts. The well-established Hallucination Leaderboard from @vectara evaluates hallucination rates for 130+ LLMs on summarization. 🔗

github.com

Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents - vectara/hallucination-leaderboard

1

4

5

Manveer Singh Tamber

@ManveerTamber

6 months

Introducing 🔍 FaithJudge: improving how we evaluate LLM faithfulness in RAG tasks, including summarization, QA, and data-to-text and powering a more accurate LLM Hallucination Leaderboard. 🔗

github.com

Contribute to vectara/FaithJudge development by creating an account on GitHub.

1

5

10

Ofer Mendelevitch

@ofermend

6 months

Super excited to share FaithJudge - a new hallucination benchmark and leaderboard. Shoutout to @lintool @ManveerTamber and many others who contributed. Paper: https://t.co/WTEa9bQMXd Github:

github.com

Contribute to vectara/FaithJudge development by creating an account on GitHub.

0

2

3

Manveer Singh Tamber

@ManveerTamber

6 months

Our work raises questions about the commercial viability of embedding models and the security of black-box retrieval models. Work done with @jsprxian and @lintool https://t.co/WqWuKE0G8I

aclanthology.org

Manveer Singh Tamber, Jasper Xian, Jimmy Lin. Findings of the Association for Computational Linguistics: NAACL 2025. 2025.

0

Manveer Singh Tamber

@ManveerTamber

6 months

💥Turns out: commercial embedding models are very vulnerable to being distilled. We also show how to train even better models by simply concatenating embeddings from multiple teachers and distilling those, attaining strong results.

1

0

Manveer Singh Tamber

@ManveerTamber

6 months

Commercial embedding models are locked 🔒 behind APIs, but how protected are they from theft🥷? In our #NAACL2025 Findings paper, we show they can be stolen at a modest cost, with thief models that rival their victims and generalize well to out-of-domain retrieval tasks.

1

6

Ronak Pradeep

@rpradeep42

11 months

LiT5 (led by @ManveerTamber) is heading to #ECIR2025! And the 🍒 on top? We've added LiT5-v2 which can rerank 100 segments at once 🤯 Paper soon, but it's already live on RankLLM ( https://t.co/5tpi09PtmY)!

github.com

RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking. - castorini/rank_llm

Manveer Singh Tamber

@ManveerTamber

2 years

🚀Thrilled to unveil our work in efficient zero-shot listwise reranking! LiT5 harnesses T5 models to challenge state-of-the-art standards, with significantly smaller models. Discover more:

0

1

9

Jimmy Lin

@lintool

1 year

Pablo Picasso said "good artists borrow, great artists steal". So much innovation happening so quickly in the embedding space... it's easier just to steal other people's models. @ManveerTamber and @jsprxian show you how: https://t.co/9WirycBhmb

0

3

Manveer Singh Tamber

@ManveerTamber

2 years

🔗 Our models are now accessible on 🤗 Hugging Face: https://t.co/UjyHbq3IX6. You can replicate our findings with our code: https://t.co/ZMH226KL6k.

github.com

Contribute to castorini/LiT5 development by creating an account on GitHub.

0

1

Manveer Singh Tamber

@ManveerTamber

2 years

🌍 Tested on diverse BEIR collections, LiT5-Distill & LiT5-Score prove their mettle, showcasing exceptional adaptability and generalization capabilities.

1

0

Manveer Singh Tamber

@ManveerTamber

2 years

📊 MS MARCO benchmarks reveal LiT5-Distill's competitive edge – at times outshining state-of-the-art models – all while being more compact and built on outdated T5 weights.

1

0

1

Manveer Singh Tamber

@ManveerTamber

2 years

💡 Introducing LiT5-Distill & LiT5-Score: models ranging from 220M to 3B parameters for listwise reranking. 🌟 LiT5-Distill distills RankGPT3.5 & RankGPT4's power into smaller, yet effective models. 🔍 LiT5-Score uses cross-attention scores for reranking.

1

0

1

Manveer Singh Tamber

@ManveerTamber

2 years

🚀Thrilled to unveil our work in efficient zero-shot listwise reranking! LiT5 harnesses T5 models to challenge state-of-the-art standards, with significantly smaller models. Discover more:

arxiv.org

Recent work in zero-shot listwise reranking using LLMs has achieved state-of-the-art results. However, these methods are not without drawbacks. The proposed methods rely on large LLMs with...

1

4

19

Ronak Pradeep

@rpradeep42

2 years

Here we revisit the wonders of T5, Fusion-in-Decoder (FiD), and building smaller and effective rerankers, focussing on the listwise paradigm of prompt decoders! You can find the code and models here - https://t.co/JAoBQIJHLa. Led by the amazing @ManveerTamber!

github.com

Contribute to castorini/LiT5 development by creating an account on GitHub.

Jimmy Lin

@lintool

2 years

Prompt-decoder LLMs for listwise reranking too large for you? Introducing our new LiT5 family of listwise reranking models: nearly as good but *much* smaller. Yup, T5's still got tricks to offer!

0

2

6