
Kevin Wei
@kevinlwei
Followers
1K
Following
16K
Media
5
Statuses
62
Science of AI evaluations + U.S. AI policy @RANDCorporation | @Harvard_Law '26, @SchwarzmanOrg '23, @GTOMSCS '22 | Views mine only 🏳️🌈 🎉
New York, USA
Joined July 2013
We wrote a paper last year about all the ways industry orgs could influence policy . tl;dr: unsurprisingly, there's lots of places you could spend money to influence policy, and industry is massively outspending civil society orgs on AI.
arxiv.org
Industry actors in the United States have gained extensive influence in conversations about the regulation of general-purpose artificial intelligence (AI) systems. Although industry participation...
AI industry lobbying + PACs will be the most well funded in history, making it all the more important to pass federal legislation soon before the process is completely corrupted.
1
3
18
RT @janet_e_egan: Selling H20 (and potentially Blackwell?) chips to China gives up valuable leverage. @ohlennart and I argue there's a smar….
0
41
0
I think I also had a high bar for "interdisciplinary" position papers. If the basis of your position is arguments from law, economics, sociology, etc., then I expect you to actually engage with that literature, not just throw around some keywords and citations!.
0
0
3
Strong +1, my pile also had papers that read like technical papers but without experiments/theory/data, very odd/confusing to me as that's not the point of a position paper imo. (My scores were 1, 2, 3, 3, 10 - the 10 was very good, and I hope it gets an award).
I finished my reviews for the NeurIPS position track with an average score of 2/10 and top score of 3/10. I support publishing position papers at AI venues, but authors (and reviewers) should realize that the purpose isn't a shortcut for publishing second-rate work at NeurIPS. .
1
0
5
I'm the Submissions Editor this year, which means I manage the entire submissions pipeline. Feel free to email me at jolt.submissions@gmail.com with questions.
0
0
0
We've also just revamped our website with lots more information! New on the site: . - Details about our review process.- A data retention policy.- An AI usage policy (tl;dr: OK to use AI if you disclose, and you're responsible for any errors).
1
0
0
As of today, submissions for @HarvardJOLT's spring issue are open! . We're looking for law review articles related to law and technology (defined very broadly). Articles can be doctrinal, empirical, historical, philosophical, etc. Scholastica link is in the thread :).
1
3
6
RT @michael__aird: 🚀Come join my team at RAND!. We’re looking for research leads, researchers, & project managers for our compute, US AI po….
0
12
0
RT @evaluatingevals: 🚨 AI Evals Crisis: Officially kicking off the Eval Science Workstream 🚨 . We’re building a shared scientific foundati….
evalevalai.com
Announcing the launch of a research-driven initiative among a community of researchers to strengthen the science of AI evaluations.
0
7
0
RT @daniel_d_kang: As AI agents near real-world use, how do we know what they can actually do? Reliable benchmarks are critical but agentic….
0
32
0
And shoutout to all our coauthors @SunishchalDev , @m_j_byun, @AnkaReuel , @xave_rg , Rachel Calcott, @EvieCoxon, @chinmay_deshp !.
0
0
4
Find our recommendations, reporting checklist, and results 👇 . Arxiv version with 3 min exec summary here:
arxiv.org
In this position paper, we argue that human baselines in foundation model evaluations must be more rigorous and more transparent to enable meaningful comparisons of human vs. AI performance, and...
1
0
3
We then systematically reviewed 115 human baseline studies and found substantial shortcomings:. * The median sample size is 8 people.* 98% lack statistical power analysis.* 67% only report point entimates (no SD or intervals).* 78% and 59% don't make data or code available 😭😭😭.
1
0
2
We look at measurement theory from the social sciences to write recommendations for more rigorous human baselines. We also produce a reporting checklist to help make results/methods more transparent.
1
0
2
Human baselines add important context to AI evals: ML researchers need them to assess performance differences, users can check them for adoption decisions, and policymakers can use them to understand risk and economic impact. But most human baselines aren't good enough for this!.
1
0
2
🚨 New paper alert! 🚨. Are human baselines rigorous enough to support claims about "superhuman" performance?. Spoiler alert: often not!. @prpaskov and I will be presenting our spotlight paper at ICML next week on the state of human baselines + how to improve them!
1
8
20
RT @law_ai_: 📢 Last Call for Applications! Apply by May 31 to join one of our three in-person events this summer:. 📆 Summer Institute on La….
0
6
0
RT @adnhw: Really excited to share my first ever paper! “Third-party compliance reviews for AI safety frameworks” 🚀. See below for more ⬇️….
0
15
0
RT @lawfare: "By establishing state data commons, policymakers can help ensure that AI’s benefits extend to all communities, advancing the….
0
2
0