Shrey Kothari
@Shreyko
Followers
2K
Following
1K
Media
44
Statuses
228
cofounder & ceo @AntimLabs @4wallai | prev @columbia
sf
Joined September 2015
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
62
134
1K
career update: i joined @AntimLabs as a founding research engineer to work on scaling RL, transfer learning, and advancing reasoning agents! moving full-time to sf next month!
80
7
492
welcome to the team, Ron!
career update: i joined @AntimLabs as a founding research engineer to work on scaling RL, transfer learning, and advancing reasoning agents! moving full-time to sf next month!
2
3
31
raised some money moved to sf hiring ml engineers (dm if you want to train models and make games)
56
16
646
Active Capital is the best team for early stage founders. if you’re raising, highly recommend reaching out to Chris or @patmatthews
1
3
27
raised some money moved to sf hiring ml engineers (dm if you want to train models and make games)
56
16
646
We need more AI benchmarks for real-world applications to better gauge progress toward AGI. This is pretty similar to ARC, but compares interactions with other humans.
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
1
1
7
I love this kind of experiment, it's fascinating to see how the AI models fare in this kind of environment.
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
1
4
21
proud of kimi holding its own in this wild social deduction setup 😠love how this benchmark turns among us into a live testbed for social reasoning, way more fun and telling than static evals
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
0
1
3
embodied models play Among Us 👀 stardew valley wen?
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
4
3
67
Hilarious to see impostor models feign ignorance. We need more examples of models willfully lying!
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
0
1
2
Very cool! I like the part where Gemini acted melodramatically after ejecting the wrong crewmate when it was, in fact, the impostor! AI-driven games are gaining traction. We might unveil something of our own soon...
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
0
1
4
Just read this new report from @RLenvs and it broke my brain. 🤯 The core idea is simple- you get frontier LLMs to play Among Us. That’s it. And somehow… it has implications for real world AI systems. Most real-world deployments will be multi-agentic: agents must coordinate,
antimlabs.com
Interactive multi‑agent benchmark in an Among‑Us‑like world: evaluate leadership, deception, and coordination across state‑of‑the‑art models.
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
2
2
9
GPT-5 had the lowest numbers of wrongful ejections as crew too, even as a overall master of deception. GPT-5 is a master at rolemaxxing, playing according to assigned role
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
0
1
1
@RLenvs turning games into real research. Love how 'Among AIs' reveals that models have stable social styles - leadership, consensus, even bluffing. Big implications for how multi-agent systems get built.
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
0
1
4
After the vending machine, this is the most unique LLM benchmark i've seen! Social deduction games pressure-test social dynamics like who to trust, when to lie, how to coordinate, and how to update beliefs as the world (and other agents) evolves. Using this benchmark helps
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
2
1
12
Love seeing the exploration of AI in conducting social experiments…as we start relying on AI as much or more than our coworkers, understanding their inherent biases and behaviors is key for ongoing trust
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
0
1
1
Watching the game in your thesis acknowledgment (@AmongUsGame) become an RL environment 😃
Introducing Among AIs, a social reasoning benchmark where embodied models play Among Us to test social intelligence: deception, persuasion, and coordination. We put 6 SOTA models in a live arena and GPT-5 came out on top by leading in Impostor & Crewmate wins. Why did GPT-5 get
0
2
11