
Seth Karten
@sethkarten
Followers
1K
Following
7K
Media
94
Statuses
3K
Autonomous Agents | CS PhD @Princeton | Simulation @Waymo | Former @SCSatCMU @Amazon | @NSF GRFP Fellow
Princeton, NJ
Joined October 2012
🚀 New preprint! .🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—all by optimizing utilities in‑context? Meet the LLM Economist ↓
2
9
23
RT @chijinML: Our department (ECE) at Princeton is hiring in AI this year!📢 Please consider applying and joining us:
0
16
0
🚨 Hackathon Weekend! 🚨. Jumpstart your PokéAgent Challenge submission ahead of NeurIPS!. 📅 Sept 13–14 .✅ Leaderboards reset Sat 10AM EDT.🎙️ Lightning talks in LLMs, RL, and Pokemon.💬 Live Office hours.🏆 $2k in prizes
1
0
8
RT @sethkarten: 🎓 University students & AI researchers — push your Pokémon AI agents further!. The NeurIPS 2025 PokéAgent Challenge is offe….
0
5
0
I don’t know who needs to hear this but qwen is really bad at Pokémon.
4
0
18
RT @emollick: It seems like there is not enough of a policy response to the fact that, with 57M miles of data, Waymo’s autonomous vehicles….
0
691
0
This is amazing. Great for local inference and light training. I’m guessing $35k. ?.
🚨 New: We built @a16z's personal GPU AI Workstation Founders Edition. - 4x NVIDIA RTX 6000 PRO Blackwell Max-Q (384GB total VRAM).- 8TB of NVMe PCIe 5.0 storage.- AMD Threadripper PRO 7975WX (32 cores, 64 threads).- 256GB ECC DDR5 RAM.- 1650Watts at peak (runs on a standard
1
0
6
🎓 University students & AI researchers — push your Pokémon AI agents further!. The NeurIPS 2025 PokéAgent Challenge is offering compute credits, courtesy of our sponsor Google DeepMind, to help you train bigger models & run more experiments. 📌 To apply:.1️⃣ Make a submission to.
0
5
39
I am very excited to see this deep dive into Gemini Plays Pokemon! This is a great effort and shows the sheer complexity of deploying LLM agents and scaffolding at scale and long contexts.
I wrote up the making-of for Gemini Plays Pokémon: how I designed the scaffold so Gemini 2.5 Pro could handle a long-horizon game, what failed, and the lessons that made it work. Full post:
0
1
7
This is exactly a perspective that we exploit in the LLM Economist. By grounding personas in intrinsic utility functions, we bound the degree to which personas can go haywire.
New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.
1
1
4
I vibe-coded a hidden 90s-style easter egg to my website. I used Huggingface’s Anycoder to prototype the retro design direction—think neon text and CRT glow. If you can guess the secret code, you can view the final form on my website. Sneak peek below.
4
1
16
Honored by @rohanpaul_ai‘s summary of the LLM Economist!.
The paper builds a small simulated economy with 100 language‑model “workers” and one language‑model “planner”, then lets that planner tweak 7 income‑tax brackets every 128 steps until the society’s average happiness ends up about 90% higher than under the current US code. In
1
0
6
LLM Economist creates optimal tax policy <—> TaxCalcBench does your taxes.AI Tax Civilization: Who is building this? .
arxiv.org
We present the LLM Economist, a novel framework that uses agent-based modeling to design and assess economic policies in strategic environments with hierarchical decision-making. At the lower...
1/ Can AI file your taxes? Not yet. We tested the latest frontier models and the results were full of catastrophic errors. Letting AI do your taxes would mean IRS rejections, audits, and penalties:
1
2
6
Pushing the frontier is good, but I like speedrunning. How far could your agent do in 6 hours?
LM reasoning benchmark idea: have it beat a Hardcore Nuzlocke run of Pokémon Run & Bun or a Kaizo ROM hack! Give it access to search online, use damage calculators, etc. People spend literally hundreds of hours meticulously planning battles, managing their available mons, etc.
0
1
11
RT @sethkarten: 🚀 New preprint! .🤔 Can one agent “nudge” a synthetic civilization of Census‑grounded agents toward higher social welfare—al….
0
9
0
Thanks @_akhaliq for the interest in the LLM Economist! For details on synthetic nudging + democratic alignment, check the full thread ↘️ .
0
0
2
Special thanks to my collaborators @WenzheLiTHU @Hanry65960814 Samuel Kleiner @yubai01 @chijinML and to @coop_ai for great feedback at the 2024 summer school.
0
0
2
A sandbox for mechanism design: iterate incentive schemes inside large‑scale simulacra before touching the real world. Thoughts? RT if you think generative agents can design policy. 🔄❤️.
1
0
1
Democratic alignment: in a special case, periodic citizen voting can fire the planner. Leader turnover keeps welfare high and prevents policy drift—central nudging plus decentralized oversight in one sandbox.
1
0
1