alberto_testoni Profile Banner
Alberto Testoni Profile
Alberto Testoni

@alberto_testoni

Followers
241
Following
285
Media
16
Statuses
70

PostDoc @amsterdamumc / NLP4Health. Prev. @UvA_Amsterdam/@illc_amsterdam. PhD @UniTrento_DISI - @AmazonScience. MSc @cimec_unitrento, BSc @Unibo.

Amsterdam
Joined January 2012
Don't wanna be here? Send us removal request.
@alberto_testoni
Alberto Testoni
29 days
RT @Cohere_Labs: Today, our team will share “From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions” at @aclmeeting!….
0
4
0
@alberto_testoni
Alberto Testoni
3 months
RT @ale_suglia: Excited to present PLAYPEN, an environment for learning through dialogue game self-play. Are you interested in LLM post-tra….
0
9
0
@grok
Grok
6 days
What do you want to know?.
535
333
2K
@alberto_testoni
Alberto Testoni
6 months
RT @CohereForAI: Can LLMs collaborate effectively over long-term interactions, like a human teammate, especially in coding tasks? 🤔. We int….
0
7
0
@alberto_testoni
Alberto Testoni
6 months
RT @mziizm: Excited to share insights from our new paper on evaluating LLMs in multi-session coding interactions! 📚📚📚. We introduce MEMORYC….
0
9
0
@alberto_testoni
Alberto Testoni
8 months
4/4 Our results reveal significant limitations and problems of overconfidence of state-of-the-art large V&L models. For more analyses on the role of the saliency features that guide the model selection and on CoT prompting, check out our paper! 🏓
Tweet card summary image
arxiv.org
Ambiguity resolution is key to effective communication. While humans effortlessly address ambiguity through conversational grounding strategies, the extent to which current language models can...
0
0
2
@alberto_testoni
Alberto Testoni
8 months
3/4 We find significant limitations of all models in responding to these questions. But what can go wrong when ambiguity is not recognized? In RAcQUEt-Bias, we analyze a critical yet underexplored problem: failing to address ambiguity can lead to stereotypical responses.
Tweet media one
1
0
1
@alberto_testoni
Alberto Testoni
8 months
2/4 We examine referential ambiguity in image-based question answering by introducing a manually curated dataset, RAcQUEt. We categorize a range of human responses into distinct classes to gauge the way they respond to ambiguity and use these for evaluating model outputs.
Tweet media one
1
0
0
@alberto_testoni
Alberto Testoni
8 months
1/4 Excited to share our latest paper “🏓 RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs”. Joint work with @barbara_plank and @raquel_dmg. #NLProc 🧵
Tweet media one
1
1
5
@alberto_testoni
Alberto Testoni
9 months
RT @cimec_unitrento: 🔍 Papers being presented:. 1️⃣ Learning to Ask Informative Questions: Enhancing LLMs with Preference Optimization and….
0
2
0
@alberto_testoni
Alberto Testoni
10 months
RT @dmazzaccara: Flying to Miami! I will present “Learning to Ask Informative Questions: Enhancing LLMs with Preference Optimization and Ex….
0
5
0
@alberto_testoni
Alberto Testoni
11 months
RT @barbara_plank: PhD opportunities in Munich 🥳 - consider applying to MCML and reach out if you are interested in @MaiNLPlab research the….
0
13
0
@alberto_testoni
Alberto Testoni
1 year
2) "Don't Buy it! Reassessing the Ad Understanding Abilities of Contrastive Multimodal Models" work led by @anna_bavaresco_ with @raquel_dmg (poster 12/8 at 14:00 + oral 13/8 11:45)
Tweet card summary image
arxiv.org
Image-based advertisements are complex multimodal stimuli that often contain unusual visual elements and figurative language. Previous research on automatic ad understanding has reported...
0
0
7
@alberto_testoni
Alberto Testoni
1 year
I am attending #ACL2024 in Bangkok with 2 papers on multimodal #NLProc 🇹🇭 🧵.1) "Naming, Describing, and Quantifying Visual Objects in Humans and LLMs" with @sandropezzelle and J. Sprott (poster 12/8 at 14:00 - with a fun game for attendees)
Tweet card summary image
arxiv.org
While human speakers use a variety of different expressions when describing the same object in an image, giving rise to a distribution of plausible labels driven by pragmatic constraints, the...
2
2
19
@alberto_testoni
Alberto Testoni
1 year
5/5 ⚠️ We conclude that LLMs are not yet ready to systematically replace human judges in NLP, and caution against using LLMs for this purpose. JUDGE-BENCH is intended as a living benchmark, and you are welcome to contribute:
Tweet card summary image
github.com
Contribute to dmg-illc/JUDGE-BENCH development by creating an account on GitHub.
0
2
12
@alberto_testoni
Alberto Testoni
1 year
4/5 📊 The gap between open and closed models is narrowing, indicating promising prospects for reproducibility. When evaluating different linguistic dimensions, GPT-4o and Gemini-1.5 perform best in acceptability and verbosity, while Mixtral leads in coherence and consistency.
Tweet media one
1
1
9
@alberto_testoni
Alberto Testoni
1 year
3/5 📊 We find that LLMs exhibit a large variance across datasets in their correlation to human judgments. While some LLMs correlate well with human judgments on some datasets, each tested LLM performs poorly on some others and exhibits significant variance across datasets.
Tweet media one
1
0
8
@alberto_testoni
Alberto Testoni
1 year
2/5 🔍 Our evaluation goes beyond existing work by including a wide variety of datasets that differ in the type of task, the property being judged, the type of judgments, and the expertise of human annotators. We evaluate 11 open-weight and proprietary LLMs of different sizes.
Tweet media one
1
0
7
@alberto_testoni
Alberto Testoni
1 year
1/5 📣 Excited to share “LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks”! 🚀 We introduce JUDGE-BENCH, a benchmark to investigate to what extent LLM-generated judgements align with human evaluations. #NLProc
Tweet media one
4
24
97
@alberto_testoni
Alberto Testoni
1 year
RT @raquel_dmg: I'm looking for a last-minute emergency reviewer for a COLM submission related to generation with LLMs. Reviews need to be….
0
4
0
@alberto_testoni
Alberto Testoni
1 year
RT @ELLISforEurope: 22 researchers from 12 European institutions discussed future directions in open #LLMs and multimodal language technolo….
0
7
0