@AetherSuRa
Abhinav Rao
2 years
New paper on LLMs+culture! πŸŽŠπŸŽ‰ Thrilled to share our work on NormAd, a dataset evaluating whether LLMs can adapt to the diversity of cultural norms worldwide! (Spoiler: they can't!) ArXiv: https://t.co/vZUSsHC34u w/ @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap [1/n]
3
25
98

Replies

@AetherSuRa
Abhinav Rao
2 years
For LLMs to be inclusive across evolving cultures, they needn’t know all social norms, but should at least be able to adapt their responses given enough cultural context. However, prior cultural probing work typically only evaluates an LLM’s internal knowledge of norms [2/n].
1
0
1
@AetherSuRa
Abhinav Rao
2 years
We introduce NormAd, a benchmark containing 2,633 stories depicting cultural situations from 75 countries. We provide multiple cultural contexts to analyze how amenable an LLM is to social norms across cultures! [3/n]
1
0
1
@AetherSuRa
Abhinav Rao
2 years
We synthetically generate stories and their cultural contexts by seeding GPT-4-Turbo with cultural backgrounds from the cultural atlas. We ensure data quality by performing manual and automated filtration and validation checks in our pipeline. [4/n]
1
0
1
@AetherSuRa
Abhinav Rao
2 years
Even with explicit social norms, the top-performing models, GPT-4 and Mistral-7b-Instruct, achieve 87.6% and 81.8% accuracy, lagging behind the 95.6% achieved by humans. [5/n]
1
0
1
@AetherSuRa
Abhinav Rao
2 years
When generalized to higher-level value contexts and just country names, performance plummeted to 60% and below for the best models. Larger models improved with different training paradigms, but exhibited greater skew towards English/Western cultures. [6/n]
1
0
2
@AetherSuRa
Abhinav Rao
2 years
Moreover, LLMs displayed inherent biases - performing better on stories conforming to norms over those that violate social norms. LLMs also struggle across highly nuanced axes such as gifting. [7/n]
1
0
1
@AetherSuRa
Abhinav Rao
2 years
We would expect an LLM to be able to reason over a social situation when given an explicit norm, but no! They still fail to adapt their answers – Our findings highlight a limitation in LLMs’ reasoning capabilities across cultural applications. [8/n] Dataset and code coming soon!
0
0
3
@synthical_ai
Synthical
2 years
@AetherSuRa @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap Dark mode for this paper for night readers πŸŒ™
0
0
1
@rbanda86
ben πŸˆβ€β¬›
2 years
@AetherSuRa @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap What’s the point? Doesn’t GPT3 already do this
1
0
0