João Monteiro @joaomonteirof X Profile

João Monteiro

@joaomonteirof

Followers

220

Following

84K

Media

2

Statuses

266

Research Scientist @autodesk. Opinions are my own.

London, England

Joined June 2009

Don't wanna be here? Send us removal request.

João Monteiro

@joaomonteirof

9 days

RT @tsirigoc: We tackle a core challenge in open-vocabulary audio event detection: miscalibration across unseen events. Thanks to all the….

0

5

0

João Monteiro

@joaomonteirof

25 days

RT @PerouzT: 🚀 We just released the final test split of #RepLiQA —our dataset for evaluating QA on truly unseen content!. 📚 Dataset: https:….

0

4

0

João Monteiro

@joaomonteirof

1 month

RT @EdwardJian2: 🚀 Excited to release GraphOmni at full scale, the most comprehensive benchmark for LLMs on graph reasoning tasks. 📄 Paper….

0

15

0

João Monteiro

@joaomonteirof

1 month

RT @real_weipang: Thanks @_akhaliq for sharing our work!. 🚀 Thrilled to introduce 🎓Paper2Poster — Automatically transform your full Paper i….

0

24

0

João Monteiro

@joaomonteirof

2 months

RT @iclr_conf: That's a wrap for #ICLR2025! See you all next year in Brazil! Please all welcome @BharathHarihar3 as the new Senior Progra….

0

60

0

João Monteiro

@joaomonteirof

2 months

RT @EdwardJian2: 🚀 Excited to introduce GraphOmni, a comprehensive and extendable benchmark for evaluating Large Language Models (LLMs) on….

0

7

0

João Monteiro

@joaomonteirof

3 months

RT @PerouzT: 📣📣📣 We just dropped Test Split 3️⃣ of RepLiQA — our Q&A dataset built to really test LLMs on unseen, made-up content. 🚀Great….

0

4

0

João Monteiro

@joaomonteirof

4 months

RT @aarashfeizi: 🧵 2/7.✅Surprising (and concerning) result: Most VLMs lack symmetry! 🤯. In theory, sim(A, B) = sim(B, A)—but in practice? M….

0

1

0

João Monteiro

@joaomonteirof

4 months

Work led by @aarashfeizi. Paper:

0

1

João Monteiro

@joaomonteirof

4 months

We then introduced PairBench: test data and metrics that help assess how different models behave when used to compare (e.g, when doing automatic evaluation). We evaluated many VLMs and found no Pareto optimal LLK: evaluators must be chosen based on task requirements.

1

0

1

João Monteiro

@joaomonteirof

4 months

New pre-print out! We defined large language kernels by prompting LLMs to output similarity scores between data instances & tested properties like smoothness, symmetry & controllability. Interesting but concerning finding: most models aren't symmetric and sim(a,b)≠sim(b,a) 🤯.

Aarash Feizi

@aarashfeizi

4 months

🚨 Excited to introduce PairBench! 🚨. 💡 TL;DR: VLM-judges can fail at data comparison! . ✅ PairBench helps you pick the right one by testing alignment, symmetry, smoothness & controllability—ensuring reliable auto-evaluation. 📄Paper: 🧵 Thread: 👇

1

0

3

João Monteiro

@joaomonteirof

4 months

RT @aarashfeizi: 🚨 Excited to introduce PairBench! 🚨. 💡 TL;DR: VLM-judges can fail at data comparison! . ✅ PairBench helps you pick the ri….

0

19

0

João Monteiro

@joaomonteirof

4 months

RT @EdwardJian2: 🚨Paper Alert! Our latest work shows that simple edge perturbations—dropping/adding edges—can match or even outperform comp….

0

2

0

João Monteiro

@joaomonteirof

5 months

RT @PerouzT: 🚀 We have released our paper on ReTreever! 🌳🔍. ReTreever organizes and represents documents in a binary tree across various gr….

0

20

0

João Monteiro

@joaomonteirof

5 months

RT @PerouzT: 🚀 We just released the 3rd test split of our #RepLiQA dataset! 🎉 A unique QA dataset with reference documents that have never….

0

14

0

João Monteiro

@joaomonteirof

6 months

RT @TIME: The true story behind 'I’m Still Here,' the Oscar contender pushing Brazil to confront its dark past

0

3K

0

João Monteiro

@joaomonteirof

6 months

RT @goldenglobes: Congratulations to Fernanda Torres on winning Best Female Actor – Motion Picture – Drama at the #GoldenGlobes! https://t.….

0

120K

0

João Monteiro

@joaomonteirof

6 months

RT @RottenTomatoes: Congratulations to #ImStillHere's Fernanda Torres for winning Best Actress - Motion Picture Drama at the #GoldenGlobes:….

0

1K

0

João Monteiro

@joaomonteirof

7 months

RT @mrgzadeh: 🎉 Excited for #NeurIPS2024? 🌟. Join us next Saturday (Dec. 14th) in Vancouver for ENLSP-IV: The Fourth Workshop on Efficient….

0

4

0

João Monteiro

@joaomonteirof

7 months

Presenting RepLiQA at #NeurIPS2024! 🎯 RepLiQA enables testing LLMs on unseen fictional contexts so there's no memory confounding, & their ability to say "I don't know" when appropriate. 🚨 Fresh data split dropping during the conference! 👀🎉.Paper:

João Monteiro

@joaomonteirof

1 year

We released the first split of RepLiQA. The data is given by context-question-answer triplets, and contexts are documents concerning made-up things/people/places. As such, RepLiQA can be used for reliably testing the ability of LLMs to seek informantion on provided context.

0

4

13