Jacob Steinhardt
@JacobSteinhardt
Followers
10K
Following
222
Media
22
Statuses
428
Assistant Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI
Joined December 2011
In July, I went on leave from UC Berkeley to found @TransluceAI, together with Sarah Schwettmann (@cogconfluence). Now, our work is finally public.
Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: https://t.co/IUIhBjpYhS
4
18
350
We are excited to welcome Conrad Stosz to lead governance efforts at Transluce. Conrad previously led the US Center for AI Standards and Innovation, defining policies for the federal government’s high-risk AI uses. He brings a wealth of policy & standards expertise to the team.
1
9
25
3->5, 4->6, 9→11, 7-> ? LLMs solve this via In-Context Learning (ICL); but how is ICL represented and transmitted in LLMs? We build new tools identifying “extractor” and “aggregator” subspaces for ICL, and use them to understand ICL addition tasks like above. Come to
6
36
207
On our evals for HAL, we found that agents figure out they're being evaluated even on capability evals. For example, here Claude 3.7 Sonnet *looks up the benchmark on HuggingFace* to find the answer to an AssistantBench question. There were many such cases across benchmarks and
To make a model that *doesn't* instantly learn to distinguish between "fake-ass alignment test" and "normal task." ...seems like the first thing to do seems like it would be "make all alignment evals very small variations on actual capability evals." Do people do this?
2
12
41
Halcyon was instrumental in helping Transluce get off the ground, and Mike has been a great partner ever since! I definitely recommend them if you are founding an impact-focused org.
Exactly two years ago, I launched @HalcyonFutures. So far we’ve seeded and launched 16 new orgs and companies, and helped them raise nearly a quarter billion dollars in funding. Flash back to 2022: After eight years in VC, I stepped back to explore questions about exponential
0
1
16
We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
1
13
72
At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)
5
38
245
How can we verify that AI ChemBio safety tests were properly run? Today we're launching STREAM: a checklist for more transparent eval results. I read a lot of model reports. Often they miss important details, like human baselines. STREAM helps make peer review more systematic.
2
16
82
@JacobSteinhardt, CEO of @TransluceAI, spoke on "Post-AGI Game Theory", i.e. how future AIs will influence their own development. He had a concrete proposal: flood the internet with high-quality examples of AI behavior acting on good values.
2
4
38
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
6
34
200
This Friday we're hosting "From Theory to Practice to Policy", a fireside chat between Yo Shavit (@yonashav) and Shafi Goldwasser. If you're local to SF and interested in the relationship between new technologies and policy, register to join! https://t.co/Or3R9E79uk
luma.com
Join Yonadav Goldwasser Shavit (OpenAI) and Shafi Goldwasser (UC Berkeley) for a discussion spanning theory, practice, and policy. Topics we'll discuss…
2
6
25
We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria ( https://t.co/wtratbvRnF) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am
1
7
40
I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown. I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
101
470
4K
Transluce is hosting an #ICML2025 happy hour on Thursday, July 17 in Vancouver. Come meet us and learn more about our work! 🥂 https://t.co/1HShAR6nub
luma.com
Transluce is hosting a happy hour at ICML 2025✨ Come meet members of our team and learn more about Transluce's vision and research. There will be drinks,…
1
7
39
Ever wondered how likely your AI model is to misbehave? We developed the *propensity lower bound* (PRBO), a variational lower bound on the probability of a model exhibiting a target (misaligned) behavior.
Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎
1
3
42
Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎
5
34
168
Every frontier AI system should be grounded in a core commitment: to protect human joy and endeavour. Today, we launch @LawZero_, a nonprofit dedicated to advancing safe-by-design AI. https://t.co/6VJecvaXYT
27
84
306
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
51
204
1K
Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at @OpenAI and societal impact @AnthropicAI Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P
30
37
543
Synthetic Continued Pretraining ( https://t.co/0epeIbxaLD) has been accepted as an Oral Presentation at #ICLR2025! We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.
Grab your favorite preprint of the week: how can you put its knowledge in your LM’s parameters? Continued pretraining (CPT) works well with >10B tokens, but the preprint is <10K. Synthetic CPT downscales CPT to such small, targeted domains. 📜: https://t.co/nHblLT4YEy 🧵👇
1
12
82