@1littlecoder
1LittleCoder💻
2 months
After the vending machine, this is the most unique LLM benchmark i've seen! Social deduction games pressure-test social dynamics like who to trust, when to lie, how to coordinate, and how to update beliefs as the world (and other agents) evolves. Using this benchmark helps
@Shreyko
Shrey Kothari
2 months
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
2
1
12

Replies

@1littlecoder
1LittleCoder💻
2 months
Claude plays along as an imposter 🤣
0
0
2
@VibeEdgeAI
VibeEdge
2 months
@1littlecoder This is a crucial step for AI. It correctly focuses on social skills instead of just static scores, showing that a model's true capability is in navigating complex, dynamic environments.
0
0
2