Keyon Vafa
@keyonV
Followers
5K
Following
2K
Media
176
Statuses
1K
Postdoctoral fellow at @Harvard_Data | Former computer science PhD with @Blei_Lab at @Columbia University | Researching AI + world models
Joined August 2011
Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws đź§µ
213
1K
7K
The paper has more empirical and theoretical results showing why world models would improve with better predictions of future latents Paper: https://t.co/704LvJXin3
0
0
7
Interesting paper from MSR (@jayden_teoh_ @JohnCLangford + others) finds a simple change gets better world models in transformers: predict future latent states in addition to next tokens (to encourage parsimonious representations). Better maps of New York and world model metrics
1
3
23
Prof David Blei (@blei_lab) is looking for #PhD students interested in machine learning and Bayesian statistics. To find out more about him - https://t.co/9jg5MtwsBx. For info on our #computerscience PhD program https://t.co/Mfln4FF1dp. The deadline to apply is December 15.
0
3
11
Four faculty members—molecular biologist Catherine Dulac, constitutional scholar Noah Feldman, economic historian Claudia Goldin, and theoretical physicist Cumrun Vafa—were named University Professors, Harvard’s highest distinction, on Wednesday. #Harvard
https://t.co/pVeyQDxFu0
harvardmagazine.com
Catherine Dulac, Noah Feldman, Claudia Goldin, and Cumrun Vafa receive the University’s highest faculty distinction.
1
7
35
Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast. https://t.co/DX9bbalx0B Excited for the potential of building specialized models to help in critical domains.
56
75
792
I'm really excited about this work (two years in the making!). We look at how LLMs seek out and integrate information and find that even GPT-5-tier models are bad at this, meaning we can use Bayesian inference to uplift weak LMs and beat them... at 1% of the cost đź‘€
Do AI agents ask good questions? We built “Collaborative Battleship” to find out—and discovered that weaker LMs + Bayesian inference can beat GPT-5 at 1% of the cost. Paper, code & demos: https://t.co/lV76HRKR3d Here's what we learned about building rational information-seeking
0
2
14
Wednesday, October 22nd at 11am CT: TTIC's Young Researcher Seminar Series presents Keyon Vafa (@keyonV) of @harvard_data with a talk titled "Evaluating the Implicit World Models of Generative Models." Please join us in Room 530, 5th floor.
0
1
5
I’m hiring a pre-doc! Come work with me on how AI is changing the labor market and how algorithms impact markets. Non-econ backgrounds welcome. Application details below – excited to collaborate! Start: Summer 2026 Deadline: Nov 1, 2025 https://t.co/2joGp5czWN
@predoc_org
17
92
397
Reminder to go watch this video from @keyonV. He does a great job explaining this research area in a short period of time. Even if you're not into this topic, the methodological / proof challenges (does a blackbox have a model?) are quite interesting. https://t.co/mZGeWZWZBx
3
13
108
One of the most fascinating research agendas I’ve seen. Colloquially people using LLMs refer to them having world models because they seem to generalize well on many tasks. Keyon and his collaborators show they don’t in ways that are nuanced but important for practitioners.
Here's a video I made that goes over methods we've worked on for evaluating world models. Thank you @srush_nlp for the opportunity!
0
1
14
Here's a video I made that goes over methods we've worked on for evaluating world models. Thank you @srush_nlp for the opportunity!
How can we evaluate whether LLMs and other generative models understand the world? New guest video from Keyon Vafa (@keyonV) on methods for evaluating world models.
1
4
49
How can we evaluate whether LLMs and other generative models understand the world? New guest video from Keyon Vafa (@keyonV) on methods for evaluating world models.
2
20
145
Great @QuantaMagazine article about world models that covers some of our recent research
The wide-ranging abilities of large language models like ChatGPT can give users the (mistaken) impression that AI understands our world. A scaled-down world model is a long-sought and still unrealized goal. @johnpavlus explains:
0
0
7
Can #LLMs grasp the real world? MIT & Harvard researchers (@m_sendhil, @asheshrambachan, @petergchang, @keyonV) propose a new way to test how predictive AI applies knowledge across domains. Learn more: https://t.co/npsSXgyHyT
0
5
5
📢 We're thrilled to announce the CMU AI for Science Workshop on Sept 12 at CUC-MPW! Featuring an amazing lineup of speakers: - Akari Asai (AI2/CMU) - Gabe Gomes (CMU) - Chenglei Si (Stanford) - Keyon Vafa (Harvard) Join us on campus, submit your poster & register here:
cmu-ai-for-science-workshop.github.io
We are hosting AI for Science Workshop at Carnegie Mellon University, Pittsburgh, PA, USA on September 12, 2025.
1
15
128
Work with Emma!
🚨 New postdoc position in our lab @Berkeley_EECS! 🚨 (please retweet + share with relevant candidates) We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences! More info in thread 1/3
0
0
5
Key question for incorporating AI into firms: can AI recover signal that human managers miss? @brian_jabarian’s (w @Henkel_JLuca) JMP says yes! Huge field experiment incorporating AI into interview process has a huge effect on who is selected & positive effect on performance
@Henkel_JLuca @Teleperformance 3/ Key Results: In contrast to the forecast of professional recruiters, AI-led interviews lead to: • +12% more job offers • +18% more starters • +17% higher retention after 1 month
2
7
29
📢NEW POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. 🧵
2
65
362
How do people reason so flexibly about new problems, bringing to bear globally-relevant knowledge while staying locally-consistent? Can we engineer a system that can synthesize bespoke world models (expressed as probabilistic programs) on-the-fly?
2
21
94