
Benjamin Manning
@BenSManning
Followers
1K
Following
1K
Media
45
Statuses
532
PhD candidate @MIT | Techno-optimistic-ish | https://t.co/CGdh0awKpK
Cambridge, MA
Joined February 2022
Brand new paper with @johnjhorton that I'm very excited to share: "General Social Agents". Suppose we wanted to create AI agents for simulations to make predictions in never-before-seen settings. How might we do this? We explore an approach to answering that question!
14
82
407
RT @soumitrashukla9: This is a very thought-provoking and novel paper. Highly recommend reading it!. (link in replies) .
0
12
0
Turn old photos into videos and see friends and family come to life. Try Grok Imagine, free for a limited time.
733
1K
5K
RT @MichaelEddy: Impact-focused funders often ask: if it worked here, will it work there? . This paper is a small, but impt step toward a g….
0
3
0
RT @emollick: This is a fascinating paper that suggests that AI agents can indeed be used for social science experiments, but that just usi….
0
54
0
RT @johnjhorton: .@BenSManning preparing big twitter thread about our new paper; tags wrong John Horton. i'm dying.
0
2
0
RT @johnjhorton: 🚨New working paper! 🚨. In a nutshell, it's about how you might create "general" agents in the sense that their behavior wo….
0
18
0
RT @tylercowen: This and related work will revolutionize several different fields of economics:.
0
30
0
Feedback is still very much welcome! Agents and games are available in hyperlinks throughout the paper. Link:
3
2
20
Optimized agents predict the human responses far better than an off-the-shelf baseline LLM (3x) and relevant game-theoretic equilibria (2x). In 86% of the games, all human subjects chose a strategy in support of the LLM simulations; only 18% were in support of the equilibria.
3
8
60
The sampled games are highly diverse with a striking range of equilibrium distributions. Some equilibria spread probability across many options, while others concentrate only on extremes. A are almost uniform, while others exhibit sharp spikes at specific actions.
1
1
13
We then put the strategic and optimized agents to an extreme test. We created a population of 800K+ novel strategic games, sampled 1500, which the agents then played in 300,000 simulations. But we first have 3 humans (4500 total) play each game in a pre-registered experiment.
1
0
13
When we test the approach on agents motivated by atheoretical and scientifically meaningless prompts, they do no better at predicting the human responses in the novel games than the LLM off-the-shelf (even when they have good training performance).
1
0
9
Next, we design 4 new games and have humans play them in preregistered experiments. Optimized agents are very accurate predictors of humans - far better than the baseline LLM. In some games, AI sims predict human responses better than relevant human data from Arad & Rubinstein.
1
1
11
When we have these optimized agents play two new related, but distinct games, the optimized set performs well in matching these out-of-sample human distributions. The off-the-shelf LLM still performs poorly.
1
0
13
For the 11-20 money request game, the theory is level-k thinking, and the seed game is the human responses from the original paper. We construct a set of candidate agents based on a model of level-k thinking and then optimize them to match human responses with high accuracy.
1
0
12
However, we can do better—even in settings outside the underlying LLM’s training data. The idea, in a nutshell, is to use a "seed" game & some relevant theory to create AI agents that can then be used in conceptually related settings.
1
1
13
We asked GPT-4o to play the 11-20 money request game from Arad & Rubinstein (2012) 1000 times without additional instructions. Its responses are FAR from the human distribution.
1
3
14
Link to the paper: Off-the-shelf, LLMs are often poor predictors of human responses. To give an example, consider the 11-20 money request game from Arad & Rubinstein (2012).
1
4
26