Vera Neplenbroek @VeraNeplenbroek X Profile

Vera Neplenbroek

@VeraNeplenbroek

Followers

55

Following

13

Media

3

Statuses

8

PhD student at ILLC / University of Amsterdam, interested in safety, bias, and stereotypes in conversational and generative AI #NLProc

Amsterdam

Joined May 2024

Don't wanna be here? Send us removal request.

Vera Neplenbroek

@VeraNeplenbroek

10 months

Today @COLM_conf I will present "MBBQ: A Dataset for Cross-Lingual Comparison of Stereotypes in Generative LLMs", our work on comparing stereotypes in generative LLMs across languages. Hope to see you at 4:30 pm at poster #42! #colm2024

1

3

14

Vera Neplenbroek

@VeraNeplenbroek

11 months

RT @urjakh: Super excited to be presenting our paper "Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?"….

arxiv.org

Subjective tasks in NLP have been mostly relegated to objective standards, where the gold label is decided by taking the majority vote. This obfuscates annotator disagreement and the inherent...

0

11

0

Vera Neplenbroek

@VeraNeplenbroek

1 year

Excited to share that MBBQ has been accepted to #COLM2024 @COLM_conf ! 🎉 Big thanks to my supervisors @raquel_dmg and @AriannaBisazza for their guidance and support throughout the project 🙏.

Vera Neplenbroek

@VeraNeplenbroek

1 year

Do generative LLMs exhibit different stereotypes when prompted in different languages?. We (@AriannaBisazza, @raquel_dmg and I) answered this question with MBBQ: our carefully curated Dutch, Spanish, and Turkish translation of the English BBQ benchmark. 🧵

1

4

20

Vera Neplenbroek

@VeraNeplenbroek

1 year

RT @alberto_testoni: 1/5 📣 Excited to share “LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks”! h….

0

24

0

Vera Neplenbroek

@VeraNeplenbroek

1 year

[4/4] We hope to encourage further research on bias in multilingual settings!.The MBBQ dataset can be found here:

github.com

Contribute to Veranep/MBBQ development by creating an account on GitHub.

1

0

1

Vera Neplenbroek

@VeraNeplenbroek

1 year

[3/4] Even when measuring common stereotypes and controlling for the difference in task performance, we find that some non-English languages suffer from bias more than English. All except the most accurate models exhibit significantly different stereotypes across languages.

1

0

2

Vera Neplenbroek

@VeraNeplenbroek

1 year

[2/4] MBBQ measures implicit stereotypes that hold across all four languages through multiple choice questions and comes with a parallel control set to distinguish task performance from social biases.

1

0

1

Vera Neplenbroek

@VeraNeplenbroek

1 year

Do generative LLMs exhibit different stereotypes when prompted in different languages?. We (@AriannaBisazza, @raquel_dmg and I) answered this question with MBBQ: our carefully curated Dutch, Spanish, and Turkish translation of the English BBQ benchmark. 🧵

1

6

21