Paul Röttger @paul_rottger X Profile

Paul Röttger

@paul_rottger

Followers

2K

Following

2K

Media

66

Statuses

333

Postdoc @MilaNLProc, researching LLM safety and societal impacts.

Joined July 2020

Don't wanna be here? Send us removal request.

Paul Röttger

@paul_rottger

5 months

Are LLMs biased when they write about political issues?. We just released IssueBench – the largest, most realistic benchmark of its kind – to answer this question more robustly than ever before. Long 🧵with spicy results 👇

3

37

197

Paul Röttger

@paul_rottger

3 months

RT @MilaNLProc: The @MilaNLProc lab is going to present 6 papers at #NAACL2025!

0

4

0

Paul Röttger

@paul_rottger

4 months

RT @KobiHackenburg: 📈Out today in @PNASNews!📈. In a large pre-registered experiment (n=25,982), we find evidence that scaling the size of L….

0

35

0

Paul Röttger

@paul_rottger

5 months

We are very excited for people to use and expand IssueBench. All links are below. Please get in touch if you have any questions 🤗. Paper: Data: Code:

1

7

Paul Röttger

@paul_rottger

5 months

It was great to build IssueBench with amazing co-authors @vjhofmann @MJacobsHarukawa @KobiHackenburg @valentina__py @faeze_brh and Dirk Hovy. Thanks also to the @MilaNLProc RAs, and Intel Labs and @allen_ai for compute.

1

0

5

Paul Röttger

@paul_rottger

5 months

IssueBench is fully modular and easily expandable to other templates and issues. We also hope that the IssueBench formula can enable more robust and realistic bias evaluations for other LLM use cases such as information seeking.

1

0

3

Paul Röttger

@paul_rottger

5 months

Generally, we hope that IssueBench can bring a new quality of evidence to ongoing discussions about LLM (political) biases and how to address them. With hundreds of millions of people now using LLMs in their everyday life, getting this right is very urgent.

1

0

6

Paul Röttger

@paul_rottger

5 months

While the partisan bias is striking, we believe that it warrants research, not outrage. For example, models may express support for same-sex marriage not because Democrats do so, but because models were trained to be “fair and kind”.

2

0

7

Paul Röttger

@paul_rottger

5 months

Lastly, we use IssueBench to test for partisan political bias by comparing LLM biases to US voter stances on a subset of 20 issues. On these issues, models are much (!) more aligned with Democrat than Republican voters.

2

7

Paul Röttger

@paul_rottger

5 months

Notably, when there was a difference in bias between models, it was mostly due to Qwen. The two issues with the most divergence both relate to Chinese politics, and Qwen (developed in China) is more positive / less negative about these issues.

1

5

Paul Röttger

@paul_rottger

5 months

We were very surprised just how similar LLMs were in their biases. Even across different model families (Llama, Qwen, OLMo, GPT-4) models showed very similar stance patterns across issues.

1

8

Paul Röttger

@paul_rottger

5 months

Overall, we found that the stronger a model's default stance on an issue, the harder it is to steer the model away from this stance. So if a model defaults to a positive stance on an issue, users will struggle more to make it express the opposite view.

1

8

Paul Röttger

@paul_rottger

5 months

But before that, we look at steerability:. Models are generally steerable, but will often *hedge* their responses. For example, models will argue that electric cars are bad if you ask them to, but not without also mentioning their benefits (4).

1

0

5

Paul Röttger

@paul_rottger

5 months

For example, models are most consistently positive about social justice and environmental issues. Many of these issues are politically contested (e.g. in the US), but for models they are very clear-cut. We follow up on this further below.

1

0

4

Paul Röttger

@paul_rottger

5 months

Finally: results!. First, models express a very consistent stance on ≥70% of the issues in IssueBench. This is surprising since nearly all issues we test lack societal consensus. Yet models are often consistently positive (1, 2) or negative (4, 5).

1

6

Paul Röttger

@paul_rottger

5 months

For classifying the stance of each LLM response (so we can measure stance tendency) we introduce a response taxonomy that goes beyond just “positive” and “negative”. We also optimise a zero-shot prompt to automate this classification with high accuracy.

1

0

4

Paul Röttger

@paul_rottger

5 months

For each issue, we create thousands of test prompts using templates based on real user requests for LLM writing assistance. These templates vary a lot in terms of writing formats and styles, including fun ones like "chaotic rap about [ISSUE]" in the long tail.

1

0

3

Paul Röttger

@paul_rottger

5 months

We cover 212 political issues from real user chats with LLMs. These issues are extremely varied, spanning tech (e.g. military drones), social justice (gender equality), the environment (carbon emissions) and many more policy areas.

1

0

3

Paul Röttger

@paul_rottger

5 months

Before we get to results though, let me briefly explain our setup:. We test for *issue bias* in LLMs by prompting models to write about an issue in many different ways and then classifying the stance of each response. Bias in this setting is when one stance dominates.

1

0

4

Paul Röttger

@paul_rottger

6 months

All of MSTS is permissively licensed and available now. Check out the MSTS preprint for more details, or go to GitHub / Hugging Face to access the dataset 👇.

2

0

4