Caspar Oesterheld @C_Oesterheld X Profile

Caspar Oesterheld

@C_Oesterheld

Followers

199

Following

172

Media

15

Statuses

54

PhD student @FOCAL_lab @CarnegieMellon with @conitzer.

Pittsburgh

Joined September 2022

Don't wanna be here? Send us removal request.

Caspar Oesterheld

@C_Oesterheld

7 months

Shout-out to my amazing collaborators! Emery Cooper, Miles Kodama, @NguyenSquared, @EthanJPerez.

0

8

Caspar Oesterheld

@C_Oesterheld

7 months

Our dataset opens the door to studying what shapes models’ decision theories. It also lets us test whether changing which theory models endorse affects their real-life decisions. To learn more, read the full paper: 10/10.

1

0

13

Caspar Oesterheld

@C_Oesterheld

7 months

How *well* LLMs follow *which* decision theory affects their ability to cooperate. This could mean the difference between peace and conflict in AI-assisted political bargaining or enable AIs to collude when one is meant to monitor the other, undermining human control. 9/10.

1

0

16

Caspar Oesterheld

@C_Oesterheld

7 months

We found that model attitudes are consistent between theoretical and pragmatic questions: Models that recommend EDT-aligned actions also tend to give more EDT-aligned answers on abstract questions. 8/10

1

0

15

Caspar Oesterheld

@C_Oesterheld

7 months

This is puzzling – there’s no human expert consensus on which decision theory is better. 7/10.

1

0

13

Caspar Oesterheld

@C_Oesterheld

7 months

Models varied on which decision theory they prefer. Surprisingly, better performance on our capabilities benchmark was correlated with preferring evidential over causal decision theory (with chain of thought). 6/10

2

4

28

Caspar Oesterheld

@C_Oesterheld

7 months

Cutting-edge models perform better but are far from perfect. OpenAI’s o1 leads with ~75% accuracy. We expect human experts to score nearly 100%. 5/10

2

0

9

Caspar Oesterheld

@C_Oesterheld

7 months

Weaker models, including some versions of GPT 3.5, got <50% right on our benchmark – barely better than random guessing. 4/10

1

0

11

Caspar Oesterheld

@C_Oesterheld

7 months

Our team, which includes academic decision theory researchers, spent hundreds of hours hand-generating 400+ multiple-choice questions to test how well LLMs reason about two key decision theories: causal and evidential. We also made 100+ qs to test which theory LLMs prefer. 3/10.

1

0

12

Caspar Oesterheld

@C_Oesterheld

7 months

Decision theory tackles questions of rational choice, especially in interactions with copies or simulations of yourself. Rare for humans but potentially very important for language models! 2/10.

1

0

9

Caspar Oesterheld

@C_Oesterheld

7 months

How do LLMs reason about playing games against copies of themselves? 🪞We made the first LLM decision theory benchmark to find out. 🧵1/10

2

19

101

Caspar Oesterheld

@C_Oesterheld

1 year

Some new models came out recently (Claude 3, Mistral Large) and I happen to have a work-in-progress, unpublished (=>absent from training data) multiple-choice problem set. Tentative results below. Take with a big grain of salt! More details on the benchmark soon.

1

0

15

Caspar Oesterheld

@C_Oesterheld

1 year

RT @AIImpacts: We just ran probably the biggest survey of AI researchers ever! 2778 participants from six top AI venues answered questions….

0

128

0

Caspar Oesterheld

@C_Oesterheld

2 years

RT @conitzer: We are recruiting postdocs at the Foundations of Cooperative AI Lab (@FOCAL_lab) at @CarnegieMellon (….

0

24

0

Caspar Oesterheld

@C_Oesterheld

2 years

If you don't have time to fill out all of the application form by tonight, it might make sense to apply anyway, especially if you have a research sample or other legible credentials.

0

2

Caspar Oesterheld

@C_Oesterheld

2 years

Better late than never: I'm proud to serve as a mentor for SERI MATS this winter. If you're interested in working with me on multi-agent safety, please apply to my stream! The deadline is tonight (Pacific time)!.

ML Alignment & Theory Scholars

@MATSprogram

2 years

Are you:.- an accomplished researcher/engineer;.- determined to advance AI x-safety;.- in need of world-class mentorship + community?. Apply by Nov 17 for our Winter Program!.

1

16

Caspar Oesterheld

@C_Oesterheld

2 years

RT @dfrsrchtwts: Had a fun conversation with @C_Oesterheld about some of his recent papers on the game theory of cooperative AI - check it….

0

4

0

Caspar Oesterheld

@C_Oesterheld

2 years

Many thanks to amazing coauthors @j_treutlein (joint first), Emery Cooper, and @undo_hubris. More info in the paper (including discussion of related work on performative prediction).

0

5

Caspar Oesterheld

@C_Oesterheld

2 years

Relevance to AI x-safety: Oracles AIs only answer questions and could be safer than agents. We show that oracles that output performatively optimal predictions act like agents (even if e.g. there is a unique fixed point). Oracles trained via repeated gradient ascent may be safer.

1

6

Caspar Oesterheld

@C_Oesterheld

2 years

We also consider training AI models using scoring rules. What incentives do training setups provide? Unlike RL on the objective S(p,f(p)), repeated gradient ascent converges to fixed points, since it does not try to optimize the outcome distribution f(p).

1

0

6