Abhinav Rao @AetherSuRa tweet - New paper on LLMs+culture! 🎊🎉 Thrilled to share our work on NormAd, a dataset evaluating whether LLMs can adapt to the diversity of cultural norms worldwide! (Spoiler: they can't!) ArXiv: https://t.co/vZUSsHC34u w/ @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap [1/n]

Abhinav Rao

@AetherSuRa

2 years

New paper on LLMs+culture! 🎊🎉 Thrilled to share our work on NormAd, a dataset evaluating whether LLMs can adapt to the diversity of cultural norms worldwide! (Spoiler: they can't!) ArXiv: https://t.co/vZUSsHC34u w/ @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap [1/n]

Replies

Abhinav Rao

@AetherSuRa

2 years

For LLMs to be inclusive across evolving cultures, they needn’t know all social norms, but should at least be able to adapt their responses given enough cultural context. However, prior cultural probing work typically only evaluates an LLM’s internal knowledge of norms [2/n].

Abhinav Rao

@AetherSuRa

2 years

We introduce NormAd, a benchmark containing 2,633 stories depicting cultural situations from 75 countries. We provide multiple cultural contexts to analyze how amenable an LLM is to social norms across cultures! [3/n]

Abhinav Rao

@AetherSuRa

2 years

We synthetically generate stories and their cultural contexts by seeding GPT-4-Turbo with cultural backgrounds from the cultural atlas. We ensure data quality by performing manual and automated filtration and validation checks in our pipeline. [4/n]

Abhinav Rao

@AetherSuRa

2 years

Even with explicit social norms, the top-performing models, GPT-4 and Mistral-7b-Instruct, achieve 87.6% and 81.8% accuracy, lagging behind the 95.6% achieved by humans. [5/n]

Abhinav Rao

@AetherSuRa

2 years

When generalized to higher-level value contexts and just country names, performance plummeted to 60% and below for the best models. Larger models improved with different training paradigms, but exhibited greater skew towards English/Western cultures. [6/n]

Abhinav Rao

@AetherSuRa

2 years

Moreover, LLMs displayed inherent biases - performing better on stories conforming to norms over those that violate social norms. LLMs also struggle across highly nuanced axes such as gifting. [7/n]

Abhinav Rao

@AetherSuRa

2 years

We would expect an LLM to be able to reason over a social situation when given an explicit norm, but no! They still fail to adapt their answers – Our findings highlight a limitation in LLMs’ reasoning capabilities across cultural applications. [8/n] Dataset and code coming soon!

Synthical

@synthical_ai

2 years

@AetherSuRa @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap Dark mode for this paper for night readers 🌙

ben 🐈‍⬛

@rbanda86

2 years

@AetherSuRa @akhila_yerukola @vishwayvs @_doctor_kat @MaartenSap What’s the point? Doesn’t GPT3 already do this