Peter Hase
@peterbhase
Followers
3K
Following
2K
Media
57
Statuses
481
AI Institute Fellow at Schmidt Sciences. Postdoc at Stanford NLP Group. Previously: Anthropic, AI2, Google, Meta, UNC Chapel Hill
New York, NY
Joined April 2019
My last PhD paper 🎉: fundamental problems with model editing for LLMs! We present *12 open challenges* with definitions/benchmarks/assumptions, inspired by work on belief revision in philosophy To provide a way forward, we test model editing against Bayesian belief revision 🧵
3
75
305
We're excited to welcome 28 new AI2050 Fellows! This 4th cohort of researchers are pursuing projects that include building AI scientists, designing trustworthy models, and improving biological and medical research, among other areas. https://t.co/8oY7xdhxvF
6
27
172
I am recruiting 2 PhD students to work on LM interpretability at UMD @umdcs starting in fall 2026! We are #3 in AI and #4 in NLP research on @CSrankings. Come join us in our lovely building just a few miles from Washington, D.C. Details in 🧵
12
154
715
Techniques like synthetic document fine-tuning (SDF) have been proposed to modify AI beliefs. But do AIs really believe the implanted facts? In a new paper, we study this empirically. We find: 1. SDF sometimes (not always) implants genuine beliefs 2. But other techniques do not
5
37
185
I would encourage technical AI types to consider working in grantmaking! Schmidt Sciences is hiring for a unique position where you get to continue your own research at the same time Link:
jobs.lever.co
Summary Schmidt Sciences invites recent PhD graduates in AI and computer science to apply for a 12-18 month fellows-in-residence program. Reporting to the Director of the AI Institute at Schmidt...
4
29
145
My research code has never been sloppier than when written by AI. So many silently failing training runs What works well for me: - rubber ducking in a web app What costs me hours on a 1 week lag: - pressing tab
right now is the time where the takeoff looks the most rapid to insiders (we don’t program anymore we just yell at codex agents) but may look slow to everyone else as the general chatbot medium saturates
0
0
7
Our new lab for Human & Machine Intelligence is officially open at Princeton University! Consider applying for a PhD or Postdoc position, either through the depts. of Computer Science or Psychology. You can register interest on our new website https://t.co/fRPhtmJdrH (1/2)
10
64
595
📌📌📌 I'm excited to be on the faculty job market this fall. I updated my website with my CV. https://t.co/4Ddv6tN0jq
stephencasper.com
Visit the post for more.
8
22
173
Shower thought: LLMs still have very incoherent notions of evidence, and they update in strange ways when presented with information in-context that is relevant to their beliefs. I really wonder what will happen when LLM agents start doing interp on themselves and see the source
5
5
23
My team at @AISecurityInst is hiring! This is an awesome opportunity to get involved with cutting-edge scientific research inside government on frontier AI models. I genuinely love my job and the team 🤗 Link: https://t.co/poiWqKlmgt More Info: ⬇️
3
24
110
Current agents are highly unsafe, o3-mini one of the most advanced models in reasoning score 71% in executing harmful requests 😱 We introduce a new framework for evaluating agent safety✨🦺 Discover more 👇 👩💻 Code & data: https://t.co/mw6XVDMc6q 📄 Paper:
1/ AI agents are increasingly being deployed for real-world tasks, but how safe are they in high-stakes settings? 🚨 NEW: OpenAgentSafety - A comprehensive framework for evaluating AI agent safety in realistic scenarios across eight critical risk categories. 🧵
2
16
70
New @Scale_AI paper! 🌟 LLMs trained with RL can exploit reward hacks but not mention this in their CoT. We introduce verbalization fine-tuning (VFT)—teaching models to say when they're reward hacking—dramatically reducing the rate of undetected hacks (6% vs. baseline of 88%).
9
70
282
Overdue job update -- I am now: - A Visiting Scientist at @schmidtsciences, supporting AI safety and interpretability - A Visiting Researcher at the Stanford NLP Group, working with @ChrisGPotts I am so grateful I get to keep working in this fascinating and essential area, and
15
22
174
Excited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their Chain-of-Thought (CoT) steps aren't necessarily revealing their true reasoning. Spoiler: transparency of CoT can be an illusion. (1/9) 🧵
28
145
657
really interesting to see just how gendered excitement about AI is, even among AI experts
15
40
239
🤔 Can lie detectors make AI more honest? Or will they become sneakier liars? We tested what happens when you add deception detectors into the training loop of large language models. Will training against probe-detected lies encourage honesty? Depends on how you train it!
4
11
69
New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.
37
162
1K
🙁 LLMs are overconfident even when they are dead wrong. 🧐 What about reasoning models? Can they actually tell us “My answer is only 60% likely to be correct”? ❗Our paper suggests that they can! Through extensive analysis, we investigate what enables this emergent ability.
9
49
302
colab: https://t.co/zodx4iOj5O For aficionados, the post also contains some musings on “tuning the random seed” and how to communicate uncertainty associated with this process
colab.research.google.com
Colab notebook
0
0
0
Are p-values missing in AI research? Bootstrapping makes model comparisons easy! Here's a new blog/colab with code for: - Bootstrapped p-values and confidence intervals - Combining variance from BOTH sample size and random seed (eg prompts) - Handling grouped test data Link ⬇️
1
3
9
New AI/LLM Agents Track at #EMNLP2025! In the past few years, it feels a bit odd to submit agent work to *CL venues because one had to awkwardedly fit it into Question Answering or NLP Applications. Glad to see agent research finally finds home at *CL! Kudos to the PC for
9
25
186