New paper on LLMs+culture! ππ Thrilled to share our work on NormAd, a dataset evaluating whether LLMs can adapt to the diversity of cultural norms worldwide! (Spoiler: they can't!) ArXiv: https://t.co/vZUSsHC34u w/ @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap [1/n]
3
25
98
Replies
For LLMs to be inclusive across evolving cultures, they neednβt know all social norms, but should at least be able to adapt their responses given enough cultural context. However, prior cultural probing work typically only evaluates an LLMβs internal knowledge of norms [2/n].
1
0
1
We introduce NormAd, a benchmark containing 2,633 stories depicting cultural situations from 75 countries. We provide multiple cultural contexts to analyze how amenable an LLM is to social norms across cultures! [3/n]
1
0
1
We synthetically generate stories and their cultural contexts by seeding GPT-4-Turbo with cultural backgrounds from the cultural atlas. We ensure data quality by performing manual and automated filtration and validation checks in our pipeline. [4/n]
1
0
1
Even with explicit social norms, the top-performing models, GPT-4 and Mistral-7b-Instruct, achieve 87.6% and 81.8% accuracy, lagging behind the 95.6% achieved by humans. [5/n]
1
0
1
When generalized to higher-level value contexts and just country names, performance plummeted to 60% and below for the best models. Larger models improved with different training paradigms, but exhibited greater skew towards English/Western cultures. [6/n]
1
0
2
Moreover, LLMs displayed inherent biases - performing better on stories conforming to norms over those that violate social norms. LLMs also struggle across highly nuanced axes such as gifting. [7/n]
1
0
1
We would expect an LLM to be able to reason over a social situation when given an explicit norm, but no! They still fail to adapt their answers β Our findings highlight a limitation in LLMsβ reasoning capabilities across cultural applications. [8/n] Dataset and code coming soon!
0
0
3
@AetherSuRa @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap Dark mode for this paper for night readers π
0
0
1
@AetherSuRa @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap Whatβs the point? Doesnβt GPT3 already do this
1
0
0