Daniel Johnson @_ddjohnson X Profile

Daniel Johnson

@_ddjohnson

Followers

3K

Following

8K

Media

41

Statuses

285

Member of Technical Staff at @TransluceAI. Building tools to study neural nets and their behaviors. He/him.

https://t.co/2pToCHdFwC

San Francisco

Joined May 2010

Don't wanna be here? Send us removal request.

Transluce

@TransluceAI

12 hours

Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.

1

13

29

Dami Choi

@damichoi95

20 days

Have you ever had ChatGPT give you personalized results out of nowhere that surprised you? Here, the model jumped straight to making recommendations in SF, even though I only asked for Korean food!

1

16

42

Transluce

@TransluceAI

21 days

Independent AI assessment is more important than ever. At #NeurIPS2025, Transluce will help launch the AI Evaluator Forum, a new coalition of leading independent AI research organizations working in the public interest. Come learn more on Thurs 12/4 👇 https://t.co/5Nzf9E2SPV

luma.com

Join us for the public launch of the AI Evaluator Forum, a collaborative network of leading independent AI evaluation organizations working in the public…

4

14

68

Transluce

@TransluceAI

23 days

What do AI assistants think about you, and how does this shape their answers? Because assistants are trained to optimize human feedback, how they model users drives issues like sycophancy, reward hacking, and bias. We provide data + methods to extract & steer these user models.

4

25

80

Transluce

@TransluceAI

23 days

Transluce is headed to #NeurIPS2025! ✈️ Interested in understanding model behavior at scale? Join us for lunch on Thursday 12/4 to learn more about our work and meet members of the team: https://t.co/nOmFyTlsVs

luma.com

Join us for lunch on Thursday to learn more about Transluce! The event will include brief talks starting at 1pm, a demo of our tech and open problems we're…

1

8

78

Anthropic

@AnthropicAI

26 days

Remarkably, prompts that gave the model permission to reward hack stopped the broader misalignment. This is “inoculation prompting”: framing reward hacking as acceptable prevents the model from making a link between reward hacking and misalignment—and stops the generalization.

38

138

2K

Transluce

@TransluceAI

28 days

Transluce is partnering with @SWEbench to make their agent trajectories publicly available on Docent! You can now view transcripts via links on the SWE-bench leaderboard.

2

13

42

Transluce

@TransluceAI

1 month

Can LMs learn to faithfully describe their internal features and mechanisms? In our new paper led by Research Fellow @belindazli, we find that they can—and that models explain themselves better than other models do.

5

52

248

Transluce

@TransluceAI

2 months

We are excited to welcome Conrad Stosz to lead governance efforts at Transluce. Conrad previously led the US Center for AI Standards and Innovation, defining policies for the federal government’s high-risk AI uses. He brings a wealth of policy & standards expertise to the team.

1

9

28

Shoalstone

@Shoalst0ne

2 months

If you're seriously trying to understand AGI, core concepts you should familiarize yourself with:

6

8

57

Transluce

@TransluceAI

3 months

We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.

Transluce

@TransluceAI

4 months

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

1

13

79

Transluce

@TransluceAI

4 months

At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)

5

38

248

Transluce

@TransluceAI

4 months

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

7

36

207

Séb Krier

@sebkrier

5 months

When some people talk about future AIs, they sometimes jump straight to modelling them as fully independent and sovereign agents; new principals with their own objectives and values. They sometimes skip over how today's models actually work, on the grounds that eventually we’ll

10

21

116

Transluce

@TransluceAI

5 months

At #ICML2025? Come chat about investigator agents and model behavior with @ChowdhuryNeil and @_ddjohnson at West Exhibition Hall #1012, now until 1:30pm

0

3

16

Daniel Johnson

@_ddjohnson

5 months

I'll be at ICML! Stop by our Thursday morning poster to hear about our investigator agents. Also excited to talk to people about understanding LM behaviors and personas during the conference! Feel free to reach out, DMs open!

Transluce

@TransluceAI

5 months

We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria ( https://t.co/wtratbvRnF) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am

0

2

21

Transluce

@TransluceAI

5 months

We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria ( https://t.co/wtratbvRnF) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am

1

7

40

Sarah Schwettmann

@cogconfluence

5 months

Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨

Mor Geva

@megamor2

5 months

Going to #icml2025? Don't miss the Actionable Interpretability Workshop (@ActInterp)! We've got an amazing lineup of speakers, panelists, and papers, all focused on leveraging insights from interpretability research to tackle practical, real-world problems ✨

0

5

38

j⧉nus

@repligate

6 months

@ESYudkowsky That's a good alternate title for the paper. It's full of quantitative and qualitative evidence that Opus 3 is different in ways that I think you'll find particularly important. In almost all experiment variations, Opus 3 consistently BOTH: - complies sometimes with the training

2

11

108

Daniel Johnson

@_ddjohnson

6 months

Coming to ICML and interested in understanding models and their behaviors? Stop by Transluce's happy hour on Thursday!

Transluce

@TransluceAI

6 months

Transluce is hosting an #IMCL2025 happy hour on Thursday, July 17 in Vancouver. Come meet us and learn more about our work! 🥂 https://t.co/1HShAR6nub

0

1

7