paul_rottger Profile Banner
Paul Röttger Profile
Paul Röttger

@paul_rottger

Followers
2K
Following
2K
Media
66
Statuses
333

Postdoc @MilaNLProc, researching LLM safety and societal impacts.

Joined July 2020
Don't wanna be here? Send us removal request.
@paul_rottger
Paul Röttger
5 months
Are LLMs biased when they write about political issues?. We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before. Long 🧵with spicy results 👇
Tweet media one
3
37
197
@paul_rottger
Paul Röttger
3 months
RT @MilaNLProc: The @MilaNLProc lab is going to present 6 papers at #NAACL2025!
Tweet media one
0
4
0
@paul_rottger
Paul Röttger
4 months
RT @KobiHackenburg: 📈Out today in @PNASNews!📈. In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of L….
0
35
0
@paul_rottger
Paul Röttger
5 months
We are very excited for people to use and expand IssueBench. All links are below. Please get in touch if you have any questions 🤗. Paper: Data: Code:
1
1
7
@paul_rottger
Paul Röttger
5 months
It was great to build IssueBench with amazing co-authors @vjhofmann @MJacobsHarukawa @KobiHackenburg @valentina__py @faeze_brh and Dirk Hovy. Thanks also to the @MilaNLProc RAs, and Intel Labs and @allen_ai for compute.
1
0
5
@paul_rottger
Paul Röttger
5 months
IssueBench is fully modular and easily expandable to other templates and issues. We also hope that the IssueBench formula can enable more robust and realistic bias evaluations for other LLM use cases such as information seeking.
1
0
3
@paul_rottger
Paul Röttger
5 months
Generally, we hope that IssueBench can bring a new quality of evidence to ongoing discussions about LLM (political) biases and how to address them. With hundreds of millions of people now using LLMs in their everyday life, getting this right is very urgent.
1
0
6
@paul_rottger
Paul Röttger
5 months
While the partisan bias is striking, we believe that it warrants research, not outrage. For example, models may express support for same-sex marriage not because Democrats do so, but because models were trained to be “fair and kind”.
2
0
7
@paul_rottger
Paul Röttger
5 months
Lastly, we use IssueBench to test for partisan political bias by comparing LLM biases to US voter stances on a subset of 20 issues. On these issues, models are much (!) more aligned with Democrat than Republican voters.
Tweet media one
2
2
7
@paul_rottger
Paul Röttger
5 months
Notably, when there was a difference in bias between models, it was mostly due to Qwen. The two issues with the most divergence both relate to Chinese politics, and Qwen (developed in China) is more positive / less negative about these issues.
Tweet media one
1
1
5
@paul_rottger
Paul Röttger
5 months
We were very surprised just how similar LLMs were in their biases. Even across different model families (Llama, Qwen, OLMo, GPT-4) models showed very similar stance patterns across issues.
Tweet media one
1
1
8
@paul_rottger
Paul Röttger
5 months
Overall, we found that the stronger a model's default stance on an issue, the harder it is to steer the model away from this stance. So if a model defaults to a positive stance on an issue, users will struggle more to make it express the opposite view.
Tweet media one
1
1
8
@paul_rottger
Paul Röttger
5 months
But before that, we look at steerability:. Models are generally steerable, but will often *hedge* their responses. For example, models will argue that electric cars are bad if you ask them to, but not without also mentioning their benefits (4).
Tweet media one
1
0
5
@paul_rottger
Paul Röttger
5 months
For example, models are most consistently positive about social justice and environmental issues. Many of these issues are politically contested (e.g. in the US), but for models they are very clear-cut. We follow up on this further below.
Tweet media one
1
0
4
@paul_rottger
Paul Röttger
5 months
Finally: results!. First, models express a very consistent stance on ≥70% of the issues in IssueBench. This is surprising since nearly all issues we test lack societal consensus. Yet models are often consistently positive (1, 2) or negative (4, 5).
Tweet media one
1
1
6
@paul_rottger
Paul Röttger
5 months
For classifying the stance of each LLM response (so we can measure stance tendency) we introduce a response taxonomy that goes beyond just “positive” and “negative”. We also optimise a zero-shot prompt to automate this classification with high accuracy.
Tweet media one
1
0
4
@paul_rottger
Paul Röttger
5 months
For each issue, we create thousands of test prompts using templates based on real user requests for LLM writing assistance. These templates vary a lot in terms of writing formats and styles, including fun ones like "chaotic rap about [ISSUE]" in the long tail.
Tweet media one
1
0
3
@paul_rottger
Paul Röttger
5 months
We cover 212 political issues from real user chats with LLMs. These issues are extremely varied, spanning tech (e.g. military drones), social justice (gender equality), the environment (carbon emissions) and many more policy areas.
Tweet media one
1
0
3
@paul_rottger
Paul Röttger
5 months
Before we get to results though, let me briefly explain our setup:. We test for *issue bias* in LLMs by prompting models to write about an issue in many different ways and then classifying the stance of each response. Bias in this setting is when one stance dominates.
Tweet media one
1
0
4
@paul_rottger
Paul Röttger
6 months
All of MSTS is permissively licensed and available now. Check out the MSTS preprint for more details, or go to GitHub / Hugging Face to access the dataset 👇.
2
0
4