Vera Neplenbroek Profile
Vera Neplenbroek

@VeraNeplenbroek

Followers
55
Following
13
Media
3
Statuses
8

PhD student at ILLC / University of Amsterdam, interested in safety, bias, and stereotypes in conversational and generative AI #NLProc

Amsterdam
Joined May 2024
Don't wanna be here? Send us removal request.
@VeraNeplenbroek
Vera Neplenbroek
10 months
Today @COLM_conf I will present "MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs", our work on comparing stereotypes in generative LLMs across languages. Hope to see you at 4:30 pm at poster #42! #colm2024
1
3
14
@VeraNeplenbroek
Vera Neplenbroek
11 months
RT @urjakh: Super excited to be presenting our paper "Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?"….
Tweet card summary image
arxiv.org
Subjective tasks in NLP have been mostly relegated to objective standards, where the gold label is decided by taking the majority vote. This obfuscates annotator disagreement and the inherent...
0
11
0
@VeraNeplenbroek
Vera Neplenbroek
1 year
Excited to share that MBBQ has been accepted to #COLM2024 @COLM_conf ! 🎉 Big thanks to my supervisors @raquel_dmg and @AriannaBisazza for their guidance and support throughout the project 🙏.
@VeraNeplenbroek
Vera Neplenbroek
1 year
Do generative LLMs exhibit different stereotypes when prompted in different languages?. We (@AriannaBisazza, @raquel_dmg and I) answered this question with MBBQ: our carefully curated Dutch, Spanish, and Turkish translation of the English BBQ benchmark. 🧵
Tweet media one
1
4
20
@VeraNeplenbroek
Vera Neplenbroek
1 year
RT @alberto_testoni: 1/5 📣 Excited to share “LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks”! h….
0
24
0
@VeraNeplenbroek
Vera Neplenbroek
1 year
[4/4] We hope to encourage further research on bias in multilingual settings!.The MBBQ dataset can be found here:
Tweet card summary image
github.com
Contribute to Veranep/MBBQ development by creating an account on GitHub.
1
0
1
@VeraNeplenbroek
Vera Neplenbroek
1 year
[3/4] Even when measuring common stereotypes and controlling for the difference in task performance, we find that some non-English languages suffer from bias more than English. All except the most accurate models exhibit significantly different stereotypes across languages.
Tweet media one
1
0
2
@VeraNeplenbroek
Vera Neplenbroek
1 year
[2/4] MBBQ measures implicit stereotypes that hold across all four languages through multiple choice questions and comes with a parallel control set to distinguish task performance from social biases.
Tweet media one
1
0
1
@VeraNeplenbroek
Vera Neplenbroek
1 year
Do generative LLMs exhibit different stereotypes when prompted in different languages?. We (@AriannaBisazza, @raquel_dmg and I) answered this question with MBBQ: our carefully curated Dutch, Spanish, and Turkish translation of the English BBQ benchmark. 🧵
Tweet media one
1
6
21