Caspar Oesterheld Profile
Caspar Oesterheld

@C_Oesterheld

Followers
199
Following
172
Media
15
Statuses
54

PhD student @FOCAL_lab @CarnegieMellon with @conitzer.

Pittsburgh
Joined September 2022
Don't wanna be here? Send us removal request.
@C_Oesterheld
Caspar Oesterheld
7 months
Shout-out to my amazing collaborators! Emery Cooper, Miles Kodama, @NguyenSquared, @EthanJPerez.
0
0
8
@C_Oesterheld
Caspar Oesterheld
7 months
Our dataset opens the door to studying what shapes models’ decision theories. It also lets us test whether changing which theory models endorse affects their real-life decisions. To learn more, read the full paper: 10/10.
1
0
13
@C_Oesterheld
Caspar Oesterheld
7 months
How *well* LLMs follow *which* decision theory affects their ability to cooperate. This could mean the difference between peace and conflict in AI-assisted political bargaining or enable AIs to collude when one is meant to monitor the other, undermining human control. 9/10.
1
0
16
@C_Oesterheld
Caspar Oesterheld
7 months
We found that model attitudes are consistent between theoretical and pragmatic questions: Models that recommend EDT-aligned actions also tend to give more EDT-aligned answers on abstract questions. 8/10
Tweet media one
1
0
15
@C_Oesterheld
Caspar Oesterheld
7 months
This is puzzling – there’s no human expert consensus on which decision theory is better. 7/10.
1
0
13
@C_Oesterheld
Caspar Oesterheld
7 months
Models varied on which decision theory they prefer. Surprisingly, better performance on our capabilities benchmark was correlated with preferring evidential over causal decision theory (with chain of thought). 6/10
Tweet media one
2
4
28
@C_Oesterheld
Caspar Oesterheld
7 months
Cutting-edge models perform better but are far from perfect. OpenAI’s o1 leads with ~75% accuracy. We expect human experts to score nearly 100%. 5/10
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
0
9
@C_Oesterheld
Caspar Oesterheld
7 months
Weaker models, including some versions of GPT 3.5, got <50% right on our benchmark – barely better than random guessing. 4/10
Tweet media one
1
0
11
@C_Oesterheld
Caspar Oesterheld
7 months
Our team, which includes academic decision theory researchers, spent hundreds of hours hand-generating 400+ multiple-choice questions to test how well LLMs reason about two key decision theories: causal and evidential. We also made 100+ qs to test which theory LLMs prefer. 3/10.
1
0
12
@C_Oesterheld
Caspar Oesterheld
7 months
Decision theory tackles questions of rational choice, especially in interactions with copies or simulations of yourself. Rare for humans but potentially very important for language models! 2/10.
1
0
9
@C_Oesterheld
Caspar Oesterheld
7 months
How do LLMs reason about playing games against copies of themselves? 🪞We made the first LLM decision theory benchmark to find out. 🧵1/10
Tweet media one
2
19
101
@C_Oesterheld
Caspar Oesterheld
1 year
Some new models came out recently (Claude 3, Mistral Large) and I happen to have a work-in-progress, unpublished (=>absent from training data) multiple-choice problem set. Tentative results below. Take with a big grain of salt! More details on the benchmark soon.
Tweet media one
1
0
15
@C_Oesterheld
Caspar Oesterheld
1 year
RT @AIImpacts: We just ran probably the biggest survey of AI researchers ever! 2778 participants from six top AI venues answered questions….
0
128
0
@C_Oesterheld
Caspar Oesterheld
2 years
RT @conitzer: We are recruiting postdocs at the Foundations of Cooperative AI Lab (@FOCAL_lab) at @CarnegieMellon (….
0
24
0
@C_Oesterheld
Caspar Oesterheld
2 years
If you don't have time to fill out all of the application form by tonight, it might make sense to apply anyway, especially if you have a research sample or other legible credentials.
0
0
2
@C_Oesterheld
Caspar Oesterheld
2 years
Better late than never: I'm proud to serve as a mentor for SERI MATS this winter. If you're interested in working with me on multi-agent safety, please apply to my stream! The deadline is tonight (Pacific time)!.
@MATSprogram
ML Alignment & Theory Scholars
2 years
Are you:.- an accomplished researcher/engineer;.- determined to advance AI x-safety;.- in need of world-class mentorship + community?. Apply by Nov 17 for our Winter Program!.
1
1
16
@C_Oesterheld
Caspar Oesterheld
2 years
RT @dfrsrchtwts: Had a fun conversation with @C_Oesterheld about some of his recent papers on the game theory of cooperative AI - check it….
0
4
0
@C_Oesterheld
Caspar Oesterheld
2 years
Many thanks to amazing coauthors @j_treutlein (joint first), Emery Cooper, and @undo_hubris. More info in the paper (including discussion of related work on performative prediction).
0
0
5
@C_Oesterheld
Caspar Oesterheld
2 years
Relevance to AI x-safety: Oracles AIs only answer questions and could be safer than agents. We show that oracles that output performatively optimal predictions act like agents (even if e.g. there is a unique fixed point). Oracles trained via repeated gradient ascent may be safer.
1
1
6
@C_Oesterheld
Caspar Oesterheld
2 years
We also consider training AI models using scoring rules. What incentives do training setups provide? Unlike RL on the objective S(p,f(p)), repeated gradient ascent converges to fixed points, since it does not try to optimize the outcome distribution f(p).
Tweet media one
1
0
6