joaomonteirof Profile Banner
João Monteiro Profile
João Monteiro

@joaomonteirof

Followers
220
Following
84K
Media
2
Statuses
266

Research Scientist @autodesk. Opinions are my own.

London, England
Joined June 2009
Don't wanna be here? Send us removal request.
@joaomonteirof
João Monteiro
9 days
RT @tsirigoc: We tackle a core challenge in open-vocabulary audio event detection: miscalibration across unseen events. Thanks to all the….
0
5
0
@joaomonteirof
João Monteiro
25 days
RT @PerouzT: 🚀 We just released the final test split of #RepLiQA —our dataset for evaluating QA on truly unseen content!. 📚 Dataset: https:….
0
4
0
@joaomonteirof
João Monteiro
1 month
RT @EdwardJian2: 🚀 Excited to release GraphOmni at full scale, the most comprehensive benchmark for LLMs on graph reasoning tasks. 📄 Paper….
0
15
0
@joaomonteirof
João Monteiro
1 month
RT @real_weipang: Thanks @_akhaliq for sharing our work!. 🚀 Thrilled to introduce 🎓Paper2Poster — Automatically transform your full Paper i….
0
24
0
@joaomonteirof
João Monteiro
2 months
RT @iclr_conf: That's a wrap for #ICLR2025! See you all next year in Brazil! Please all welcome @BharathHarihar3 as the new Senior Progra….
0
60
0
@joaomonteirof
João Monteiro
2 months
RT @EdwardJian2: 🚀 Excited to introduce GraphOmni, a comprehensive and extendable benchmark for evaluating Large Language Models (LLMs) on….
0
7
0
@joaomonteirof
João Monteiro
3 months
RT @PerouzT: 📣📣📣 We just dropped Test Split 3️⃣ of RepLiQA — our Q&A dataset built to really test LLMs on unseen, made-up content. 🚀Great….
0
4
0
@joaomonteirof
João Monteiro
4 months
RT @aarashfeizi: 🧵 2/7.✅Surprising (and concerning) result: Most VLMs lack symmetry! 🤯. In theory, sim(A, B) = sim(B, A)—but in practice? M….
0
1
0
@joaomonteirof
João Monteiro
4 months
Work led by @aarashfeizi. Paper:
0
0
1
@joaomonteirof
João Monteiro
4 months
We then introduced PairBench: test data and metrics that help assess how different models behave when used to compare (e.g, when doing automatic evaluation). We evaluated many VLMs and found no Pareto optimal LLK: evaluators must be chosen based on task requirements.
1
0
1
@joaomonteirof
João Monteiro
4 months
New pre-print out! We defined large language kernels by prompting LLMs to output similarity scores between data instances & tested properties like smoothness, symmetry & controllability. Interesting but concerning finding: most models aren't symmetric and sim(a,b)≠sim(b,a) 🤯.
@aarashfeizi
Aarash Feizi
4 months
🚨 Excited to introduce PairBench! 🚨. 💡 TL;DR: VLM-judges can fail at data comparison! . ✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation. 📄Paper: 🧵 Thread: 👇
Tweet media one
1
0
3
@joaomonteirof
João Monteiro
4 months
RT @aarashfeizi: 🚨 Excited to introduce PairBench! 🚨. 💡 TL;DR: VLM-judges can fail at data comparison! . ✅ PairBench helps you pick the ri….
0
19
0
@joaomonteirof
João Monteiro
4 months
RT @EdwardJian2: 🚨Paper Alert! Our latest work shows that simple edge perturbations—dropping/adding edges—can match or even outperform comp….
0
2
0
@joaomonteirof
João Monteiro
5 months
RT @PerouzT: 🚀 We have released our paper on ReTreever! 🌳🔍. ReTreever organizes and represents documents in a binary tree across various gr….
0
20
0
@joaomonteirof
João Monteiro
5 months
RT @PerouzT: 🚀 We just released the 3rd test split of our #RepLiQA dataset! 🎉 A unique QA dataset with reference documents that have never….
0
14
0
@joaomonteirof
João Monteiro
6 months
RT @TIME: The true story behind 'I’m Still Here,' the Oscar contender pushing Brazil to confront its dark past
0
3K
0
@joaomonteirof
João Monteiro
6 months
RT @goldenglobes: Congratulations to Fernanda Torres on winning Best Female Actor – Motion Picture – Drama at the #GoldenGlobes! https://t.….
0
120K
0
@joaomonteirof
João Monteiro
6 months
RT @RottenTomatoes: Congratulations to #ImStillHere's Fernanda Torres for winning Best Actress - Motion Picture Drama at the #GoldenGlobes:….
0
1K
0
@joaomonteirof
João Monteiro
7 months
RT @mrgzadeh: 🎉 Excited for #NeurIPS2024? 🌟. Join us next Saturday (Dec. 14th) in Vancouver for ENLSP-IV: The Fourth Workshop on Efficient….
0
4
0
@joaomonteirof
João Monteiro
7 months
Presenting RepLiQA at #NeurIPS2024! 🎯 RepLiQA enables testing LLMs on unseen fictional contexts so there's no memory confounding, & their ability to say "I don't know" when appropriate. 🚨 Fresh data split dropping during the conference! 👀🎉.Paper:
@joaomonteirof
João Monteiro
1 year
We released the first split of RepLiQA. The data is given by context-question-answer triplets, and contexts are documents concerning made-up things/people/places. As such, RepLiQA can be used for reliably testing the ability of LLMs to seek informantion on provided context.
0
4
13