Kevin Meng @mengk20 X Profile

Kevin Meng

@mengk20

Followers

2K

Following

2K

Media

35

Statuses

187

@TransluceAI

https://t.co/w42RAQSndF

san francisco / boston

Joined August 2016

Don't wanna be here? Send us removal request.

Kevin Meng

@mengk20

1 year

why do language models think 9.11 > 9.9? at @transluceAI we stumbled upon a surprisingly simple explanation - and a bugfix that doesn't use any re-training or prompting. turns out, it's about months, dates, September 11th, and... the Bible?

Transluce

@TransluceAI

1 year

Monitor: An Observability Interface for Language Models Research report: https://t.co/Nl88TcH8bh Live interface: https://t.co/jZAjCHd2uP (optimized for desktop)

68

149

1K

Transluce

@TransluceAI

17 days

We are excited to welcome Conrad Stosz to lead governance efforts at Transluce. Conrad previously led the US Center for AI Standards and Innovation, defining policies for the federal government’s high-risk AI uses. He brings a wealth of policy & standards expertise to the team.

1

9

27

Transluce

@TransluceAI

1 month

We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.

Transluce

@TransluceAI

2 months

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

1

13

77

Sayash Kapoor

@sayashk

2 months

Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy. I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth. Peter’s thread is a simple example of the type of analysis this enables,

Peter Kirgis

@PKirgis

2 months

OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3!

3

11

70

Transluce

@TransluceAI

2 months

At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)

5

39

247

Transluce

@TransluceAI

2 months

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

6

36

204

Kevin Meng

@mengk20

3 months

Go check @SolaAI_ out, they're doing insanely cool work!! Congrats @NeilDeshmukh @jess123555 :)

Neil Deshmukh

@NeilDeshmukh

3 months

Excited to announce that @SolaAI_ has raised a $17.5M Series A led by @a16z with support from @Conviction @ycombinator, bringing total funding to $21M 🚀 From the start, we set out to reimagine human-AI interaction to push the boundaries of process automation. Our agents watch

0

4

Transluce

@TransluceAI

4 months

Transluce is hosting an #ICML2025 happy hour on Thursday, July 17 in Vancouver. Come meet us and learn more about our work! 🥂 https://t.co/1HShAR6nub

luma.com

Transluce is hosting a happy hour at ICML 2025✨ Come meet members of our team and learn more about Transluce's vision and research. There will be drinks,…

1

7

39

Transluce

@TransluceAI

5 months

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

5

35

168

Transluce

@TransluceAI

7 months

We're flying to Singapore for #ICLR2025! ✈️ Want to chat with @ChowdhuryNeil, @JacobSteinhardt and @cogconfluence about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch 👇 https://t.co/WptR5d6gva

2

6

40

Transluce

@TransluceAI

7 months

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) https://t.co/IdBboD7NsP

OpenAI

@OpenAI

7 months

OpenAI o3 and o4-mini https://t.co/giS4K1yNh9

429

1K

11K

Daniel Liu

@daniel_c0deb0t

7 months

Transluce has great people and they do cool research work on LLMs! Take a look at their job postings if you are interested!

Kevin Meng

@mengk20

7 months

i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇

2

1

22

Sarah Schwettmann

@cogconfluence

7 months

these are pretty special roles, I can't recommend working with @mengk20, @vvhuang_ and the rest of the @TransluceAI team enough 🫡 come join us! 👇

Kevin Meng

@mengk20

7 months

i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇

0

3

15

Kevin Meng

@mengk20

7 months

2. we're also looking for research staff to work on our automated investigators these systems will propose hypotheses about general model behaviors, capabilities, and weaknesses - then test them using our intervention primitives. https://t.co/YEkwZfd2N3

0

4

Kevin Meng

@mengk20

7 months

(you can read about our vision for the open standard below) https://t.co/Ap2sVdJ8Xc

1

0

3

Kevin Meng

@mengk20

7 months

1. we're seeking exceptional product staff to work on our user-facing software you'll create delightful user interfaces & robust open standards that enable us to understand AI systems https://t.co/8XpcQLvRJi

1

0

3

Kevin Meng

@mengk20

7 months

i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇

Transluce

@TransluceAI

8 months

If you want to help build Docent and other AI tools at Transluce, we’re hiring for our product team. Apply below! https://t.co/kceXWvLF3w

6

10

48

Kevin Meng

@mengk20

8 months

what *do* the numbers really mean, mason?

2

1

32

Kevin Meng

@mengk20

8 months

at @TransluceAI, we're also really interested in this! we need better interfaces into all types of data, whether plaintext transcripts (as shown here) or model internals.

Aryaman Arora

@aryaman2020

8 months

obvious applications of interpretability are steering and monitoring (if you can get those to work that is). another application area i haven't seen much in is evals — we could eval whether models produce correct answers for the right internal reasons?

1

2

41

Kevin Meng

@mengk20

8 months

feel free to inspect the samples in Docent - we've uploaded them for the public to see. first example: https://t.co/jZCIK2srC2 second example: https://t.co/etIz0Lor1G third example: https://t.co/UMvyyTUeA1

1

35

Kevin Meng

@mengk20

8 months

to be clear, "Claude memorized the solution" doesn't mean "Claude can't do the task." it *does* mean we're not thinking about model capabilities in the right way. an undergraduate would never act like Claude did

1

3

65