Manveer Singh Tamber Profile
Manveer Singh Tamber

@ManveerTamber

Followers
34
Following
47
Media
3
Statuses
21

CS PhD Student @UWaterloo

Joined September 2023
Don't wanna be here? Send us removal request.
@ManveerTamber
Manveer Singh Tamber
7 days
Our paper with @vectara, โ€œBenchmarking LLM Faithfulness in RAG with Evolving Leaderboardsโ€, is now published in the EMNLP 2025 Industry Track! Check out our work on enabling more reliable LLM faithfulness benchmarking in RAG!
@ManveerTamber
Manveer Singh Tamber
6 months
Introducing ๐Ÿ” FaithJudge: improving how we evaluate LLM faithfulness in RAG tasks, including summarization, QA, and data-to-text and powering a more accurate LLM Hallucination Leaderboard. ๐Ÿ”—
1
3
6
@ManveerTamber
Manveer Singh Tamber
6 months
As LLMs evolve, our benchmarks must too. Weโ€™ll continue updating FaithJudge (currently using an o3-mini-high judge) and the leaderboard for more complete evaluation. ๐Ÿ“„ Full paper:
Tweet card summary image
arxiv.org
Retrieval-augmented generation (RAG) aims to reduce hallucinations by grounding responses in external context, yet large language models (LLMs) still frequently introduce unsupported information...
0
1
1
@ManveerTamber
Manveer Singh Tamber
6 months
FaithJudge uses hallucination annotation examples to guide the automated evaluation of LLM responses for the same tasks. โœ… Outperforms prior methods ๐Ÿค Matches human judgment more closely ๐Ÿ“Š Works across summarization, QA, and data-to-text
1
0
0
@ManveerTamber
Manveer Singh Tamber
6 months
We introduced FaithBench to evaluate hallucination detectors on challenging summaries from 10 modern LLMs. FaithBench highlights just how hard hallucination detection still is. ๐Ÿ”—
Tweet card summary image
aclanthology.org
Forrest Sheng Bao, Miaoran Li, Renyi Qu, Ge Luo, Erana Wan, Yujia Tang, Weisi Fan, Manveer Singh Tamber, Suleman Kazi, Vivek Sourabh, Mike Qi, Ruixuan Tu, Chenyu Xu, Matthew Gonzales, Ofer Mendelev...
1
0
0
@ManveerTamber
Manveer Singh Tamber
6 months
Existing hallucination detection methods (fine-tuned models or zero-shot LLM judges) are promising, but often struggle. FaithJudge improves detection by prompting LLM judges with human-labelled hallucination examples from FaithBench and RagTruth.
1
0
0
@ManveerTamber
Manveer Singh Tamber
6 months
LLM hallucinations remain stubbornly persistent, even with RAG. LLMs still โž• add, ๐Ÿ”„ distort, or โŒ contradict contexts. The well-established Hallucination Leaderboard from @vectara evaluates hallucination rates for 130+ LLMs on summarization. ๐Ÿ”—
Tweet card summary image
github.com
Leaderboard Comparing LLM Performance at Producing Hallucinations when Summarizing Short Documents - vectara/hallucination-leaderboard
1
4
5
@ManveerTamber
Manveer Singh Tamber
6 months
Introducing ๐Ÿ” FaithJudge: improving how we evaluate LLM faithfulness in RAG tasks, including summarization, QA, and data-to-text and powering a more accurate LLM Hallucination Leaderboard. ๐Ÿ”—
Tweet card summary image
github.com
Contribute to vectara/FaithJudge development by creating an account on GitHub.
1
5
10
@ofermend
Ofer Mendelevitch
6 months
Super excited to share FaithJudge - a new hallucination benchmark and leaderboard. Shoutout to @lintool @ManveerTamber and many others who contributed. Paper: https://t.co/WTEa9bQMXd Github:
Tweet card summary image
github.com
Contribute to vectara/FaithJudge development by creating an account on GitHub.
0
2
3
@ManveerTamber
Manveer Singh Tamber
6 months
Our work raises questions about the commercial viability of embedding models and the security of black-box retrieval models. Work done with @jsprxian and @lintool https://t.co/WqWuKE0G8I
Tweet card summary image
aclanthology.org
Manveer Singh Tamber, Jasper Xian, Jimmy Lin. Findings of the Association for Computational Linguistics: NAACL 2025. 2025.
0
0
0
@ManveerTamber
Manveer Singh Tamber
6 months
๐Ÿ’ฅTurns out: commercial embedding models are very vulnerable to being distilled. We also show how to train even better models by simply concatenating embeddings from multiple teachers and distilling those, attaining strong results.
1
0
0
@ManveerTamber
Manveer Singh Tamber
6 months
Commercial embedding models are locked ๐Ÿ”’ behind APIs, but how protected are they from theft๐Ÿฅท? In our #NAACL2025 Findings paper, we show they can be stolen at a modest cost, with thief models that rival their victims and generalize well to out-of-domain retrieval tasks.
1
1
6
@rpradeep42
Ronak Pradeep
11 months
LiT5 (led by @ManveerTamber) is heading to #ECIR2025! And the ๐Ÿ’ on top? We've added LiT5-v2 which can rerank 100 segments at once ๐Ÿคฏ Paper soon, but it's already live on RankLLM ( https://t.co/5tpi09PtmY)!
Tweet card summary image
github.com
RankLLM is a Python toolkit for reproducible information retrieval research using rerankers, with a focus on listwise reranking. - castorini/rank_llm
@ManveerTamber
Manveer Singh Tamber
2 years
๐Ÿš€Thrilled to unveil our work in efficient zero-shot listwise reranking! LiT5 harnesses T5 models to challenge state-of-the-art standards, with significantly smaller models. Discover more:
0
1
9
@lintool
Jimmy Lin
1 year
Pablo Picasso said "good artists borrow, great artists steal". So much innovation happening so quickly in the embedding space... it's easier just to steal other people's models. @ManveerTamber and @jsprxian show you how: https://t.co/9WirycBhmb
0
3
3
@ManveerTamber
Manveer Singh Tamber
2 years
๐Ÿ”— Our models are now accessible on ๐Ÿค— Hugging Face: https://t.co/UjyHbq3IX6. You can replicate our findings with our code: https://t.co/ZMH226KL6k.
github.com
Contribute to castorini/LiT5 development by creating an account on GitHub.
0
0
1
@ManveerTamber
Manveer Singh Tamber
2 years
๐ŸŒ Tested on diverse BEIR collections, LiT5-Distill & LiT5-Score prove their mettle, showcasing exceptional adaptability and generalization capabilities.
1
0
0
@ManveerTamber
Manveer Singh Tamber
2 years
๐Ÿ“Š MS MARCO benchmarks reveal LiT5-Distill's competitive edge โ€“ at times outshining state-of-the-art models โ€“ all while being more compact and built on outdated T5 weights.
1
0
1
@ManveerTamber
Manveer Singh Tamber
2 years
๐Ÿ’ก Introducing LiT5-Distill & LiT5-Score: models ranging from 220M to 3B parameters for listwise reranking. ๐ŸŒŸ LiT5-Distill distills RankGPT3.5 & RankGPT4's power into smaller, yet effective models. ๐Ÿ” LiT5-Score uses cross-attention scores for reranking.
1
0
1
@ManveerTamber
Manveer Singh Tamber
2 years
๐Ÿš€Thrilled to unveil our work in efficient zero-shot listwise reranking! LiT5 harnesses T5 models to challenge state-of-the-art standards, with significantly smaller models. Discover more:
Tweet card summary image
arxiv.org
Recent work in zero-shot listwise reranking using LLMs has achieved state-of-the-art results. However, these methods are not without drawbacks. The proposed methods rely on large LLMs with...
1
4
19
@rpradeep42
Ronak Pradeep
2 years
Here we revisit the wonders of T5, Fusion-in-Decoder (FiD), and building smaller and effective rerankers, focussing on the listwise paradigm of prompt decoders! You can find the code and models here - https://t.co/JAoBQIJHLa. Led by the amazing @ManveerTamber!
Tweet card summary image
github.com
Contribute to castorini/LiT5 development by creating an account on GitHub.
@lintool
Jimmy Lin
2 years
Prompt-decoder LLMs for listwise reranking too large for you? Introducing our new LiT5 family of listwise reranking models: nearly as good but *much* smaller. Yup, T5's still got tricks to offer!
0
2
6