Sarah Schwettmann
@cogconfluence
Followers
3K
Following
6K
Media
338
Statuses
2K
Co-founder and Chief Scientist, @TransluceAI // Research Scientist, @MIT_CSAIL
dessert of the real
Joined October 2015
We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
1
13
75
Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy. I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth. Peter’s thread is a simple example of the type of analysis this enables,
OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3!
3
12
70
At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)
5
39
244
@ImanolSchlag and team at SwissAI just released Apertus, a gorious 70B model trained on 1000+ languages. People across the #PublicAI network have been building a publicly hosted frontend for it: try it out via the new inference utility at https://t.co/Vy8bvBNyxX ! #SwissAIWeeks
publicai.co
A nonprofit, open-source service to make public and sovereign AI models more accessible.
We ran a full security & compliance evaluation of the just released 🇨🇭 Swiss LLM, 🤖 Apertus, developed by ETH Zurich & EPFL. Answers to most common questions below 👇 1/10
0
2
9
and a skyspace from James Turrell, whose work with light inspired my Vision in Art and Neuroscience class at MIT for nearly a decade. sf has a way of sneaking up on my senses with the surprisingly familiar 🫶
0
0
14
found two things in the de Young sculpture garden today that I had no idea were here! a beehive piece from Pierre Huyghe, who I worked with in 2022 to install a hive in simulation (along with a real one) on an island in Norway…
1
0
24
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
6
35
200
keeping you fed and hydrated 🫡
0
1
21
This Friday we're hosting "From Theory to Practice to Policy", a fireside chat between Yo Shavit (@yonashav) and Shafi Goldwasser. If you're local to SF and interested in the relationship between new technologies and policy, register to join! https://t.co/Or3R9E79uk
luma.com
Join Yonadav Goldwasser Shavit (OpenAI) and Shafi Goldwasser (UC Berkeley) for a discussion spanning theory, practice, and policy. Topics we'll discuss…
2
7
25
if you think data cleaning is beneath you then ngmi
Academia must be the only industry where extremely high-skilled PhD students spend much of their time doing low value work (like data cleaning). A 1st year management consultant outsources this immediately. Imagine the productivity gains if PhDs could focus on thinking
11
30
675
Largest ever (by far) randomized controlled trial evaluating the persuasive capabilities of LLMs
Today (w/ @UniofOxford @Stanford @MIT @LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19 LLMs, 707 political issues. We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more 🧵
1
3
6
maybe I will live tweet the actionable interp workshop panel
11
8
100
opportune moment for a pic of a talk written in blood @ActInterp
Huge thanks to Sarah Schwettmann for a fascinating keynote on "AI Investigators for Understanding AI Systems" 🤖 @cogconfluence @TransluceAI
0
0
22
At #ICML2025? Come chat about investigator agents and model behavior with @ChowdhuryNeil and @_ddjohnson at West Exhibition Hall #1012, now until 1:30pm
0
3
16
please come to East building poster #1108 (ballroom A) rn
ICML ✈️ this week. open to chat and learn mech interp from you. @aryaman2020 and i have cool ideas about steering, just come to our AxBench poster. new steering blog: https://t.co/ZPIIejq82M 中文:
2
8
43
First Panel at WiML @ ICML 2025! Join us for a candid convo on career pivots, leadership & growth with: Amy (@yayitsamyzhang) • Eleni (@Eleni30fillou) • Sarah (@cogconfluence) 🗓️ Wed 11am #WiML #ICML2025
0
7
16
I'll be at ICML! Stop by our Thursday morning poster to hear about our investigator agents. Also excited to talk to people about understanding LM behaviors and personas during the conference! Feel free to reach out, DMs open!
We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria ( https://t.co/wtratbvRnF) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am
0
2
21
Exciting! Don’t miss Sarah (@cogconfluence) speaking at 10:15am and joining the Redefining Success panel at 11am. See you there! 🇨🇦 #WiML #ICML2025
We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria ( https://t.co/wtratbvRnF) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am
0
1
5
We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria ( https://t.co/wtratbvRnF) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am
1
7
40