Manveer Singh Tamber
@ManveerTamber
Followers
34
Following
47
Media
3
Statuses
21
Our paper with @vectara, โBenchmarking LLM Faithfulness in RAG with Evolving Leaderboardsโ, is now published in the EMNLP 2025 Industry Track! Check out our work on enabling more reliable LLM faithfulness benchmarking in RAG!
Introducing ๐ FaithJudge: improving how we evaluate LLM faithfulness in RAG tasks, including summarization, QA, and data-to-text and powering a more accurate LLM Hallucination Leaderboard. ๐
1
3
6
As LLMs evolve, our benchmarks must too. Weโll continue updating FaithJudge (currently using an o3-mini-high judge) and the leaderboard for more complete evaluation. ๐ Full paper:
arxiv.org
Retrieval-augmented generation (RAG) aims to reduce hallucinations by grounding responses in external context, yet large language models (LLMs) still frequently introduce unsupported information...
0
1
1
FaithJudge uses hallucination annotation examples to guide the automated evaluation of LLM responses for the same tasks. โ
Outperforms prior methods ๐ค Matches human judgment more closely ๐ Works across summarization, QA, and data-to-text
1
0
0
We introduced FaithBench to evaluate hallucination detectors on challenging summaries from 10 modern LLMs. FaithBench highlights just how hard hallucination detection still is. ๐
aclanthology.org
Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelev...
1
0
0
Existing hallucination detection methods (fine-tuned models or zero-shot LLM judges) are promising, but often struggle. FaithJudge improves detection by prompting LLM judges with human-labelled hallucination examples from FaithBench and RagTruth.
1
0
0
LLM hallucinations remain stubbornly persistent, even with RAG. LLMs still โ add, ๐ distort, or โ contradict contexts. The well-established Hallucination Leaderboard from @vectara evaluates hallucination rates for 130+ LLMs on summarization. ๐
github.com
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents - vectara/hallucination-leaderboard
1
4
5
Introducing ๐ FaithJudge: improving how we evaluate LLM faithfulness in RAG tasks, including summarization, QA, and data-to-text and powering a more accurate LLM Hallucination Leaderboard. ๐
github.com
Contribute to vectara/FaithJudge development by creating an account on GitHub.
1
5
10
Super excited to share FaithJudge - a new hallucination benchmark and leaderboard. Shoutout to @lintool @ManveerTamber and many others who contributed. Paper: https://t.co/WTEa9bQMXd Github:
github.com
Contribute to vectara/FaithJudge development by creating an account on GitHub.
0
2
3
Our work raises questions about the commercial viability of embedding models and the security of black-box retrieval models. Work done with @jsprxian and @lintool
https://t.co/WqWuKE0G8I
aclanthology.org
Manveer Singh Tamber, Jasper Xian, Jimmy Lin. Findings of the Association for Computational Linguistics: NAACL 2025. 2025.
0
0
0
๐ฅTurns out: commercial embedding models are very vulnerable to being distilled. We also show how to train even better models by simply concatenating embeddings from multiple teachers and distilling those, attaining strong results.
1
0
0
Commercial embedding models are locked ๐ behind APIs, but how protected are they from theft๐ฅท? In our #NAACL2025 Findings paper, we show they can be stolen at a modest cost, with thief models that rival their victims and generalize well to out-of-domain retrieval tasks.
1
1
6
LiT5 (led by @ManveerTamber) is heading to #ECIR2025! And the ๐ on top? We've added LiT5-v2 which can rerank 100 segments at once ๐คฏ Paper soon, but it's already live on RankLLM ( https://t.co/5tpi09PtmY)!
github.com
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking. - castorini/rank_llm
๐Thrilled to unveil our work in efficient zero-shot listwise reranking! LiT5 harnesses T5 models to challenge state-of-the-art standards, with significantly smaller models. Discover more:
0
1
9
Pablo Picasso said "good artists borrow, great artists steal". So much innovation happening so quickly in the embedding space... it's easier just to steal other people's models. @ManveerTamber and @jsprxian show you how: https://t.co/9WirycBhmb
0
3
3
๐ Our models are now accessible on ๐ค Hugging Face: https://t.co/UjyHbq3IX6. You can replicate our findings with our code: https://t.co/ZMH226KL6k.
github.com
Contribute to castorini/LiT5 development by creating an account on GitHub.
0
0
1
๐ Tested on diverse BEIR collections, LiT5-Distill & LiT5-Score prove their mettle, showcasing exceptional adaptability and generalization capabilities.
1
0
0
๐ MS MARCO benchmarks reveal LiT5-Distill's competitive edge โ at times outshining state-of-the-art models โ all while being more compact and built on outdated T5 weights.
1
0
1
๐ก Introducing LiT5-Distill & LiT5-Score: models ranging from 220M to 3B parameters for listwise reranking. ๐ LiT5-Distill distills RankGPT3.5 & RankGPT4's power into smaller, yet effective models. ๐ LiT5-Score uses cross-attention scores for reranking.
1
0
1
๐Thrilled to unveil our work in efficient zero-shot listwise reranking! LiT5 harnesses T5 models to challenge state-of-the-art standards, with significantly smaller models. Discover more:
arxiv.org
Recent work in zero-shot listwise reranking using LLMs has achieved state-of-the-art results. However, these methods are not without drawbacks. The proposed methods rely on large LLMs with...
1
4
19
Here we revisit the wonders of T5, Fusion-in-Decoder (FiD), and building smaller and effective rerankers, focussing on the listwise paradigm of prompt decoders! You can find the code and models here - https://t.co/JAoBQIJHLa. Led by the amazing @ManveerTamber!
github.com
Contribute to castorini/LiT5 development by creating an account on GitHub.
Prompt-decoder LLMs for listwise reranking too large for you? Introducing our new LiT5 family of listwise reranking models: nearly as good but *much* smaller. Yup, T5's still got tricks to offer!
0
2
6