Jacob Steinhardt Profile
Jacob Steinhardt

@JacobSteinhardt

Followers
10K
Following
222
Media
22
Statuses
428

Assistant Professor of Statistics and EECS, UC Berkeley // Co-founder and CEO, @TransluceAI

Joined December 2011
Don't wanna be here? Send us removal request.
@JacobSteinhardt
Jacob Steinhardt
1 year
In July, I went on leave from UC Berkeley to found @TransluceAI, together with Sarah Schwettmann (@cogconfluence). Now, our work is finally public.
@TransluceAI
Transluce
1 year
Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: https://t.co/IUIhBjpYhS
4
18
350
@TransluceAI
Transluce
2 days
We are excited to welcome Conrad Stosz to lead governance efforts at Transluce. Conrad previously led the US Center for AI Standards and Innovation, defining policies for the federal government’s high-risk AI uses. He brings a wealth of policy & standards expertise to the team.
1
9
25
@xyVickyHu
Xinyan Hu
14 days
3->5, 4->6, 9→11, 7-> ? LLMs solve this via In-Context Learning (ICL); but how is ICL represented and transmitted in LLMs? We build new tools identifying “extractor” and “aggregator” subspaces for ICL, and use them to understand ICL addition tasks like above. Come to
6
36
207
@sayashk
Sayash Kapoor
23 days
On our evals for HAL, we found that agents figure out they're being evaluated even on capability evals. For example, here Claude 3.7 Sonnet *looks up the benchmark on HuggingFace* to find the answer to an AssistantBench question. There were many such cases across benchmarks and
@1a3orn
1a3orn
23 days
To make a model that *doesn't* instantly learn to distinguish between "fake-ass alignment test" and "normal task." ...seems like the first thing to do seems like it would be "make all alignment evals very small variations on actual capability evals." Do people do this?
2
12
41
@JacobSteinhardt
Jacob Steinhardt
25 days
Halcyon was instrumental in helping Transluce get off the ground, and Mike has been a great partner ever since! I definitely recommend them if you are founding an impact-focused org.
@MikeMcCormick_
Mike McCormick
29 days
Exactly two years ago, I launched @HalcyonFutures. So far we’ve seeded and launched 16 new orgs and companies, and helped them raise nearly a quarter billion dollars in funding. Flash back to 2022: After eight years in VC, I stepped back to explore questions about exponential
0
1
16
@TransluceAI
Transluce
29 days
We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.
@TransluceAI
Transluce
2 months
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
1
13
72
@TransluceAI
Transluce
2 months
At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)
5
38
245
@lucafrighetti
Luca Righetti
2 months
How can we verify that AI ChemBio safety tests were properly run? Today we're launching STREAM: a checklist for more transparent eval results. I read a lot of model reports. Often they miss important details, like human baselines. STREAM helps make peer review more systematic.
2
16
82
@DavidDuvenaud
David Duvenaud
2 months
@JacobSteinhardt, CEO of @TransluceAI, spoke on "Post-AGI Game Theory", i.e. how future AIs will influence their own development. He had a concrete proposal: flood the internet with high-quality examples of AI behavior acting on good values.
2
4
38
@TransluceAI
Transluce
2 months
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
6
34
200
@TransluceAI
Transluce
3 months
This Friday we're hosting "From Theory to Practice to Policy", a fireside chat between Yo Shavit (@yonashav) and Shafi Goldwasser. If you're local to SF and interested in the relationship between new technologies and policy, register to join! https://t.co/Or3R9E79uk
Tweet card summary image
luma.com
Join Yonadav Goldwasser Shavit (OpenAI) and Shafi Goldwasser (UC Berkeley) for a discussion spanning theory, practice, and policy. Topics we'll discuss…
2
6
25
@TransluceAI
Transluce
3 months
We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria ( https://t.co/wtratbvRnF) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am
1
7
40
@QuentinAnthon15
Quentin Anthony
3 months
I was one of the 16 devs in this study. I wanted to speak on my opinions about the causes and mitigation strategies for dev slowdown. I'll say as a "why listen to you?" hook that I experienced a -38% AI-speedup on my assigned issues. I think transparency helps the community.
@METR_Evals
METR
4 months
We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.
101
470
4K
@mjagadeesan25
Meena Jagadeesan
4 months
I'm so excited to be joining @Penn as an Assistant Professor in CS (@CIS_Penn) in Fall 2026! I’ll be working on machine learning ecosystems, aiming to steer how multi-agent interactions shape performance trends and societal outcomes. I’ll be recruiting PhD students this cycle!
38
52
801
@ChowdhuryNeil
Neil Chowdhury
5 months
Ever wondered how likely your AI model is to misbehave? We developed the *propensity lower bound* (PRBO), a variational lower bound on the probability of a model exhibiting a target (misaligned) behavior.
@TransluceAI
Transluce
5 months
Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎
1
3
42
@TransluceAI
Transluce
5 months
Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎
5
34
168
@LawZero_
LawZero - LoiZéro
5 months
Every frontier AI system should be grounded in a core commitment: to protect human joy and endeavour. Today, we launch @LawZero_, a nonprofit dedicated to advancing safe-by-design AI. https://t.co/6VJecvaXYT
27
84
306
@percyliang
Percy Liang
5 months
What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
51
204
1K
@ZhongRuiqi
Ruiqi Zhong
5 months
Last day of PhD! I pioneered using LLMs to explain dataset&model. It's used by interp at @OpenAI and societal impact @AnthropicAI Tutorial here. It's a great direction & someone should carry the torch :) Thesis available, if you wanna read my acknowledgement section=P
30
37
543
@ZitongYang0
Zitong Yang
6 months
Synthetic Continued Pretraining ( https://t.co/0epeIbxaLD) has been accepted as an Oral Presentation at #ICLR2025! We tackle the challenge of data-efficient language model pretraining: how to teach an LM the knowledge of small, niche corpora, such as the latest arXiv preprints.
@ZitongYang0
Zitong Yang
1 year
Grab your favorite preprint of the week: how can you put its knowledge in your LM’s parameters? Continued pretraining (CPT) works well with >10B tokens, but the preprint is <10K. Synthetic CPT downscales CPT to such small, targeted domains. 📜: https://t.co/nHblLT4YEy 🧵👇
1
12
82