Kevin Meng
@mengk20
Followers
2K
Following
2K
Media
35
Statuses
187
@TransluceAI
san francisco / boston
Joined August 2016
why do language models think 9.11 > 9.9? at @transluceAI we stumbled upon a surprisingly simple explanation - and a bugfix that doesn't use any re-training or prompting. turns out, it's about months, dates, September 11th, and... the Bible?
Monitor: An Observability Interface for Language Models Research report: https://t.co/Nl88TcH8bh Live interface: https://t.co/jZAjCHd2uP (optimized for desktop)
68
149
1K
We are excited to welcome Conrad Stosz to lead governance efforts at Transluce. Conrad previously led the US Center for AI Standards and Innovation, defining policies for the federal government’s high-risk AI uses. He brings a wealth of policy & standards expertise to the team.
1
9
27
We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
1
13
77
Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy. I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth. Peter’s thread is a simple example of the type of analysis this enables,
OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3!
3
11
70
At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)
5
39
247
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
6
36
204
Excited to announce that @SolaAI_ has raised a $17.5M Series A led by @a16z with support from @Conviction @ycombinator, bringing total funding to $21M 🚀 From the start, we set out to reimagine human-AI interaction to push the boundaries of process automation. Our agents watch
0
0
4
Transluce is hosting an #ICML2025 happy hour on Thursday, July 17 in Vancouver. Come meet us and learn more about our work! 🥂 https://t.co/1HShAR6nub
luma.com
Transluce is hosting a happy hour at ICML 2025✨ Come meet members of our team and learn more about Transluce's vision and research. There will be drinks,…
1
7
39
Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎
5
35
168
We're flying to Singapore for #ICLR2025! ✈️ Want to chat with @ChowdhuryNeil, @JacobSteinhardt and @cogconfluence about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch 👇 https://t.co/WptR5d6gva
2
6
40
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) https://t.co/IdBboD7NsP
OpenAI o3 and o4-mini https://t.co/giS4K1yNh9
429
1K
11K
Transluce has great people and they do cool research work on LLMs! Take a look at their job postings if you are interested!
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
2
1
22
these are pretty special roles, I can't recommend working with @mengk20, @vvhuang_ and the rest of the @TransluceAI team enough 🫡 come join us! 👇
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
0
3
15
2. we're also looking for research staff to work on our automated investigators these systems will propose hypotheses about general model behaviors, capabilities, and weaknesses - then test them using our intervention primitives. https://t.co/YEkwZfd2N3
0
0
4
(you can read about our vision for the open standard below) https://t.co/Ap2sVdJ8Xc
1
0
3
1. we're seeking exceptional product staff to work on our user-facing software you'll create delightful user interfaces & robust open standards that enable us to understand AI systems https://t.co/8XpcQLvRJi
1
0
3
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
If you want to help build Docent and other AI tools at Transluce, we’re hiring for our product team. Apply below! https://t.co/kceXWvLF3w
6
10
48
at @TransluceAI, we're also really interested in this! we need better interfaces into all types of data, whether plaintext transcripts (as shown here) or model internals.
obvious applications of interpretability are steering and monitoring (if you can get those to work that is). another application area i haven't seen much in is evals — we could eval whether models produce correct answers for the right internal reasons?
1
2
41
feel free to inspect the samples in Docent - we've uploaded them for the public to see. first example: https://t.co/jZCIK2srC2 second example: https://t.co/etIz0Lor1G third example: https://t.co/UMvyyTUeA1
1
1
35
to be clear, "Claude memorized the solution" doesn't mean "Claude can't do the task." it *does* mean we're not thinking about model capabilities in the right way. an undergraduate would never act like Claude did
1
3
65