Keyon Vafa
            
            @keyonV
Followers
                5K
              Following
                2K
              Media
                174
              Statuses
                1K
              Postdoctoral fellow at @Harvard_Data | Former computer science PhD with @Blei_Lab at @Columbia University | Researching AI + world models
              
              Joined August 2011
            
            
           Can an AI model predict perfectly and still have a terrible world model? What would that even mean? Our new ICML paper formalizes these questions One result tells the story: A transformer trained on 10M solar systems nails planetary orbits. But it botches gravitational laws đź§µ 
          
                
                213
              
              
                
                1K
              
              
                
                7K
              
             Four faculty members—molecular biologist Catherine Dulac, constitutional scholar Noah Feldman, economic historian Claudia Goldin, and theoretical physicist Cumrun Vafa—were named University Professors, Harvard’s highest distinction, on Wednesday. #Harvard
             https://t.co/pVeyQDxFu0 
          
          
            
            harvardmagazine.com
              Catherine Dulac, Noah Feldman, Claudia Goldin, and Cumrun Vafa receive the University’s highest faculty distinction.
            
                
                1
              
              
                
                7
              
              
                
                34
              
             Composer is a new model we built at Cursor. We used RL to train a big MoE model to be really good at real-world coding, and also very fast.  https://t.co/DX9bbalx0B  Excited for the potential of building specialized models to help in critical domains. 
          
                
                51
              
              
                
                70
              
              
                
                772
              
             I'm really excited about this work (two years in the making!). We look at how LLMs seek out and integrate information and find that even GPT-5-tier models are bad at this, meaning we can use Bayesian inference to uplift weak LMs and beat them... at 1% of the cost đź‘€ 
           Do AI agents ask good questions? We built “Collaborative Battleship” to find out—and discovered that weaker LMs + Bayesian inference can beat GPT-5 at 1% of the cost. Paper, code & demos:  https://t.co/lV76HRKR3d  Here's what we learned about building rational information-seeking 
            
                
                0
              
              
                
                2
              
              
                
                13
              
             Wednesday, October 22nd at 11am CT: TTIC's Young Researcher Seminar Series presents Keyon Vafa (@keyonV) of @harvard_data with a talk titled "Evaluating the Implicit World Models of Generative Models." Please join us in Room 530, 5th floor. 
          
                
                0
              
              
                
                1
              
              
                
                5
              
             I’m hiring a pre-doc! Come work with me on how AI is changing the labor market and how algorithms impact markets. Non-econ backgrounds welcome. Application details below – excited to collaborate! Start: Summer 2026 Deadline: Nov 1, 2025  https://t.co/2joGp5czWN 
            @predoc_org
          
          
                
                17
              
              
                
                91
              
              
                
                396
              
             Reminder to go watch this video from @keyonV. He does a great job explaining this research area in a short period of time. Even if you're not into this topic, the methodological / proof challenges (does a blackbox have a model?) are quite interesting.  https://t.co/mZGeWZWZBx 
          
          
                
                3
              
              
                
                13
              
              
                
                108
              
             One of the most fascinating research agendas I’ve seen. Colloquially people using LLMs refer to them having world models because they seem to generalize well on many tasks. Keyon and his collaborators show they don’t in ways that are nuanced but important for practitioners. 
           Here's a video I made that goes over methods we've worked on for evaluating world models. Thank you @srush_nlp for the opportunity! 
          
                
                0
              
              
                
                1
              
              
                
                13
              
             Here's a video I made that goes over methods we've worked on for evaluating world models. Thank you @srush_nlp for the opportunity! 
           How can we evaluate whether LLMs and other generative models understand the world? New guest video from Keyon Vafa (@keyonV) on methods for evaluating world models. 
            
                
                1
              
              
                
                4
              
              
                
                49
              
             How can we evaluate whether LLMs and other generative models understand the world? New guest video from Keyon Vafa (@keyonV) on methods for evaluating world models. 
          
                
                2
              
              
                
                20
              
              
                
                145
              
             Great @QuantaMagazine article about world models that covers some of our recent research 
           The wide-ranging abilities of large language models like ChatGPT can give users the (mistaken) impression that AI understands our world. A scaled-down world model is a long-sought and still unrealized goal. @johnpavlus explains: 
          
                
                0
              
              
                
                0
              
              
                
                7
              
             Can #LLMs grasp the real world? MIT & Harvard researchers (@m_sendhil, @asheshrambachan, @petergchang, @keyonV) propose a new way to test how predictive AI applies knowledge across domains. Learn more:  https://t.co/npsSXgyHyT 
          
          
                
                0
              
              
                
                5
              
              
                
                5
              
             📢 We're thrilled to announce the CMU AI for Science Workshop on Sept 12 at CUC-MPW! Featuring an amazing lineup of speakers: - Akari Asai (AI2/CMU) - Gabe Gomes (CMU) - Chenglei Si (Stanford) - Keyon Vafa (Harvard) Join us on campus, submit your poster & register here: 
          
            
            cmu-ai-for-science-workshop.github.io
              We are hosting AI for Science Workshop at Carnegie Mellon University, Pittsburgh, PA, USA on September 12, 2025.
            
                
                1
              
              
                
                15
              
              
                
                128
              
             Work with Emma! 
           🚨 New postdoc position in our lab @Berkeley_EECS! 🚨 (please retweet + share with relevant candidates) We seek applicants with experience in language modeling who are excited about high-impact applications in the health and social sciences! More info in thread 1/3 
            
                
                0
              
              
                
                0
              
              
                
                5
              
             Key question for incorporating AI into firms: can AI recover signal that human managers miss? @brian_jabarian’s (w @Henkel_JLuca) JMP says yes! Huge field experiment incorporating AI into interview process has a huge effect on who is selected & positive effect on performance 
          
              @Henkel_JLuca @Teleperformance 3/ Key Results: In contrast to the forecast of professional recruiters, AI-led interviews lead to: • +12% more job offers • +18% more starters • +17% higher retention after 1 month
            
          
                
                2
              
              
                
                7
              
              
                
                29
              
             📢NEW POSITION PAPER: Use Sparse Autoencoders to Discover Unknown Concepts, Not to Act on Known Concepts Despite recent results, SAEs aren't dead! They can still be useful to mech interp, and also much more broadly: across FAccT, computational social science, and ML4H. 🧵 
          
                
                2
              
              
                
                64
              
              
                
                359
              
             How do people reason so flexibly about new problems, bringing to bear globally-relevant knowledge while staying locally-consistent? Can we engineer a system that can synthesize bespoke world models (expressed as probabilistic programs) on-the-fly? 
          
                
                2
              
              
                
                21
              
              
                
                92
              
             New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. đź§µ 
          
                
                290
              
              
                
                1K
              
              
                
                8K
              
             Researchers from Harvard, Keyon Vafa (@keyonV) and MIT, Peter Chang (@petergchang), Ashesh Rambachan (@asheshrambachan), and Sendhil Mullainathan (@m_sendhil) have published what I consider the most interesting study on the abilities of AI models in 2025. They wanted to address 
          
                
                11
              
              
                
                27
              
              
                
                79