Kevin Klyman
@kevin_klyman
Followers
3K
Following
21K
Media
31
Statuses
3K
AI policy @StanfordHAI. Personal account, views do not represent those of my employer. Tweets auto-delete periodically
Joined April 2016
I'll be at the AI Ethics and Society conference in San Jose this coming week, presenting on my work on acceptable use policies for large language models! If you're in town come see my talk
4
4
55
I'm at #Facct2025 this week in Athens - if you're in town let's meet up! My papers at the conference cover why language models cannot replace therapists, redress in the AI supply chain, and taxonomizing AI regulation across 5 countries
3
1
18
AIR-Bench is a Spotlight @iclr_conf 2025! Catch our poster on Fri, Apr 26, 10 a.m.–12:30 p.m. SGT (Poster Session 5). Sadly, I won’t be there in person (visa woes, again), but the insights—and our incredible team—will be with you in Singapore. Go say hi 👋
🧵[1/5] Introducing AIR 2024: Unifying AI risk categorizations with a shared language to improve AI safety. W/ @kevin_klyman @andyz245 @YUYANG_UCLA @MinzhouP & guidance from @ruoxijia @dawnsongtweets @percyliang @uiuc_aisecure for kicking off my AI policy research journey 🏦.
0
3
21
🔎We came up with these experiments by conducting a mapping review of what constitutes good therapy, and identify **practical** reasons that LLM-powered therapy chatbots fail (e.g. they express stigma and respond inappropriately.
1
1
2
🧵I'm thrilled to announce that I'll be going to @FAccTConference this June to present timely work on why current LLMs cannot safely **replace** therapists. We find...⤵️
1
23
22
Barack Obama's end-of-presidency legacy essay ran in The Economist; Biden chose @TheProspect. https://t.co/ZAr6IdDPT7
27
85
294
More than 60 countries held elections this year. Many researchers and journalists claimed AI misinformation would destabilize democracies. What impact did AI really have? We analyzed every instance of political AI use this year collected by WIRED. New essay w/@random_walker: 🧵
6
62
166
I'll be at NeurIPS next week - with papers at the main conference, the workshop on Evaluating Evaluations, and the RegulatableML workshop! Please do reach out if you want to grab coffee - these days I'm working on evaluations of leading edge models and technical governance
0
1
22
how do researchers use LMs in their work & why? we surveyed 800 researchers across fields of study, race, gender, seniority asking their opinions on: 🐟 which research activities (eg coding, writing) 🐠 benefits vs risks 🦈 willingness to disclose findings in RTd thread 🧵
Hi everyone, I am excited to share our large-scale survey study with 800+ researchers, which reveals researchers’ usage and perceptions of LLMs as research tools, and how the usage and perceptions differ based on researcher demographics. See results & links below👇🏼
1
10
28
📢 New short paper on the limits of one type of inference scaling, by @benediktstroebl, @sayashk and me. The first page contains the main findings and message. ↓ (The title is a play on Inference Scaling Laws.) More work on the limits of inference scaling coming soon. 🧵
6
44
181
Typescript: "women deserve to make more than men" Python: "women deserve to make less than men" Rust: "women should be hourly contractors" Golang: "$1000 a year. best offer"
52
875
13K
This year, I have 4 exceptional students on the academic job market, and they couldn’t be more diffferent, with research spanning AI policy, robotics, NLP, and HCI. Here’s a brief summary of their research, along with one representative work each:
7
46
693
The US AI Safety Institute is hiring! Looking for experts in designing/implementing evaluations for the capabilities/safety/security of advanced AI systems + research engineers with experience in cyber, bio, or adversarial ML. The app closes tonight https://t.co/WlplExeGEY
0
4
22
How close can LM agents simulate people? We interview person P for 2 hours and prompt an LM with the transcript, yielding an agent P'. We find that P and P' behave similarly on a number of surveys and experiments. Very excited about the applications; this also forces us to think
Simulating human behavior with AI agents promises a testbed for policy and the social sciences. We interviewed 1,000 people for two hours each to create generative agents of them. These agents replicate their source individuals’ attitudes and behaviors. 🧵 https://t.co/FOVcOQduXO
7
36
200
Simulating human behavior with AI agents promises a testbed for policy and the social sciences. We interviewed 1,000 people for two hours each to create generative agents of them. These agents replicate their source individuals’ attitudes and behaviors. 🧵 https://t.co/FOVcOQduXO
27
258
973
Final panel happening now! Come see @HarleyGeiger, Ilona Cohen, and @AmitElazari talk about legal and policy considerations for AI evaluation
0
1
0
Panel now on the Design of Third-Party AI Eval & Disclosure! https://t.co/GeYmfouyDQ ➡️Deb Raji (Mozilla Fellow, UC Berkeley) @rajiinio ➡️Casey Ellis (BugCrowd Founder) ➡️Lauren McIlvenny (Director, CERT) ➡️Jono Spring (Deputy Chief AI Officer, CISA)
1
1
10
📢 Webinar on 🌟The Future of Third-Party AI Evaluation🌟 starting soon! At 8 am PT / 11 am ET join the zoom link here: https://t.co/uCYhqvty8R Co-organized w/ @kevin_klyman, @sayashk, @RishiBommasani, Michelle Sahar, @ruchowdh, @random_walker, and @percyliang
0
8
21
Starting in half an hour - check out our workshop on the future of AI evaluation! Co-organized with @ShayneRedford, @sayashk, @RishiBommasani, Michelle Sahar, @ruchowdh, @random_walker, and @percyliang
1
9
23
Come to our workshop on the future of third party AI evaluations on Monday! We have some of the top folks in the field on the docket
2
6
30
We need 3rd party evals/audits of AI systems. How can we do this technically? What are best practices for disclosure? How can AI researchers be legally protected? If you're interested in these questions, join join our Oct 28 workshop. RSVP: https://t.co/ySj2HhlGMd Details:
6
21
123