Caspar Oesterheld
@C_Oesterheld
Followers
219
Following
168
Media
15
Statuses
54
PhD student @FOCAL_lab @CarnegieMellon with @conitzer.
Pittsburgh
Joined September 2022
Shout-out to my amazing collaborators! Emery Cooper, Miles Kodama, @NguyenSquared, @EthanJPerez
0
0
8
Our dataset opens the door to studying what shapes models’ decision theories. It also lets us test whether changing which theory models endorse affects their real-life decisions. To learn more, read the full paper: https://t.co/3oV7RQ74v8 10/10
1
0
13
How *well* LLMs follow *which* decision theory affects their ability to cooperate. This could mean the difference between peace and conflict in AI-assisted political bargaining or enable AIs to collude when one is meant to monitor the other, undermining human control. 9/10
1
0
16
We found that model attitudes are consistent between theoretical and pragmatic questions: Models that recommend EDT-aligned actions also tend to give more EDT-aligned answers on abstract questions. 8/10
1
0
15
This is puzzling – there’s no human expert consensus on which decision theory is better. 7/10
1
0
13
Models varied on which decision theory they prefer. Surprisingly, better performance on our capabilities benchmark was correlated with preferring evidential over causal decision theory (with chain of thought). 6/10
2
4
27
Cutting-edge models perform better but are far from perfect. OpenAI’s o1 leads with ~75% accuracy. We expect human experts to score nearly 100%. 5/10
2
0
9
Weaker models, including some versions of GPT 3.5, got <50% right on our benchmark – barely better than random guessing. 4/10
1
0
11
Our team, which includes academic decision theory researchers, spent hundreds of hours hand-generating 400+ multiple-choice questions to test how well LLMs reason about two key decision theories: causal and evidential. We also made 100+ qs to test which theory LLMs prefer. 3/10
1
0
12
Decision theory tackles questions of rational choice, especially in interactions with copies or simulations of yourself. Rare for humans but potentially very important for language models! 2/10
1
0
9
How do LLMs reason about playing games against copies of themselves? 🪞We made the first LLM decision theory benchmark to find out. 🧵1/10
2
19
102
Some new models came out recently (Claude 3, Mistral Large) and I happen to have a work-in-progress, unpublished (=>absent from training data) multiple-choice problem set. Tentative results below. Take with a big grain of salt! More details on the benchmark soon.
1
0
15
We just ran probably the biggest survey of AI researchers ever! 2778 participants from six top AI venues answered questions from fourteen topics regarding the future of AI. Preprint: https://t.co/VffsP9eflq Six interesting things in pictures:
12
127
370
We are recruiting postdocs at the Foundations of Cooperative AI Lab (@FOCAL_lab) at @CarnegieMellon ( https://t.co/A72rHG8P87)! Please retweet / share / send great applicants our way! For different positions please reach out. @SCSatCMU @CSDatCMU @mldcmu
https://t.co/cY2yIeDXRb
0
24
73
If you don't have time to fill out all of the application form by tonight, it might make sense to apply anyway, especially if you have a research sample or other legible credentials.
0
0
2
Better late than never: I'm proud to serve as a mentor for SERI MATS this winter. If you're interested in working with me on multi-agent safety, please apply to my stream! The deadline is tonight (Pacific time)!
Are you: - an accomplished researcher/engineer; - determined to advance AI x-safety; - in need of world-class mentorship + community? Apply by Nov 17 for our Winter Program! https://t.co/bibwp2Ote7
1
1
16
Had a fun conversation with @C_Oesterheld about some of his recent papers on the game theory of cooperative AI - check it out!
1
4
10
Many thanks to amazing coauthors @j_treutlein (joint first), Emery Cooper, and @undo_hubris. More info in the paper https://t.co/GiYpvtNxDo (including discussion of related work on performative prediction).
0
0
5
Relevance to AI x-safety: Oracles AIs only answer questions and could be safer than agents. We show that oracles that output performatively optimal predictions act like agents (even if e.g. there is a unique fixed point). Oracles trained via repeated gradient ascent may be safer.
1
1
6
We also consider training AI models using scoring rules. What incentives do training setups provide? Unlike RL on the objective S(p,f(p)), repeated gradient ascent converges to fixed points, since it does not try to optimize the outcome distribution f(p).
1
0
6