mengk20 Profile Banner
Kevin Meng Profile
Kevin Meng

@mengk20

Followers
2K
Following
2K
Media
35
Statuses
187

@TransluceAI

san francisco / boston
Joined August 2016
Don't wanna be here? Send us removal request.
@mengk20
Kevin Meng
1 year
why do language models think 9.11 > 9.9? at @transluceAI we stumbled upon a surprisingly simple explanation - and a bugfix that doesn't use any re-training or prompting. turns out, it's about months, dates, September 11th, and... the Bible?
@TransluceAI
Transluce
1 year
Monitor: An Observability Interface for Language Models Research report: https://t.co/Nl88TcH8bh Live interface: https://t.co/jZAjCHd2uP (optimized for desktop)
68
149
1K
@TransluceAI
Transluce
17 days
We are excited to welcome Conrad Stosz to lead governance efforts at Transluce. Conrad previously led the US Center for AI Standards and Innovation, defining policies for the federal government’s high-risk AI uses. He brings a wealth of policy & standards expertise to the team.
1
9
27
@TransluceAI
Transluce
1 month
We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.
@TransluceAI
Transluce
2 months
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
1
13
77
@sayashk
Sayash Kapoor
2 months
Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy. I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth. Peter’s thread is a simple example of the type of analysis this enables,
@PKirgis
Peter Kirgis
2 months
OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3!
3
11
70
@TransluceAI
Transluce
2 months
At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)
5
39
247
@TransluceAI
Transluce
2 months
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
6
36
204
@mengk20
Kevin Meng
3 months
Go check @SolaAI_ out, they're doing insanely cool work!! Congrats @NeilDeshmukh @jess123555 :)
@NeilDeshmukh
Neil Deshmukh
3 months
Excited to announce that @SolaAI_ has raised a $17.5M Series A led by @a16z with support from @Conviction @ycombinator, bringing total funding to $21M 🚀 From the start, we set out to reimagine human-AI interaction to push the boundaries of process automation. Our agents watch
0
0
4
@TransluceAI
Transluce
5 months
Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎
5
35
168
@TransluceAI
Transluce
7 months
We're flying to Singapore for #ICLR2025! ✈️ Want to chat with @ChowdhuryNeil, @JacobSteinhardt and @cogconfluence about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch 👇 https://t.co/WptR5d6gva
2
6
40
@TransluceAI
Transluce
7 months
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) https://t.co/IdBboD7NsP
@OpenAI
OpenAI
7 months
OpenAI o3 and o4-mini https://t.co/giS4K1yNh9
429
1K
11K
@daniel_c0deb0t
Daniel Liu
7 months
Transluce has great people and they do cool research work on LLMs! Take a look at their job postings if you are interested!
@mengk20
Kevin Meng
7 months
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
2
1
22
@cogconfluence
Sarah Schwettmann
7 months
these are pretty special roles, I can't recommend working with @mengk20, @vvhuang_ and the rest of the @TransluceAI team enough 🫡 come join us! 👇
@mengk20
Kevin Meng
7 months
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
0
3
15
@mengk20
Kevin Meng
7 months
2. we're also looking for research staff to work on our automated investigators these systems will propose hypotheses about general model behaviors, capabilities, and weaknesses - then test them using our intervention primitives. https://t.co/YEkwZfd2N3
0
0
4
@mengk20
Kevin Meng
7 months
(you can read about our vision for the open standard below) https://t.co/Ap2sVdJ8Xc
1
0
3
@mengk20
Kevin Meng
7 months
1. we're seeking exceptional product staff to work on our user-facing software you'll create delightful user interfaces & robust open standards that enable us to understand AI systems https://t.co/8XpcQLvRJi
1
0
3
@mengk20
Kevin Meng
7 months
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
@TransluceAI
Transluce
8 months
If you want to help build Docent and other AI tools at Transluce, we’re hiring for our product team. Apply below! https://t.co/kceXWvLF3w
6
10
48
@mengk20
Kevin Meng
8 months
what *do* the numbers really mean, mason?
2
1
32
@mengk20
Kevin Meng
8 months
at @TransluceAI, we're also really interested in this! we need better interfaces into all types of data, whether plaintext transcripts (as shown here) or model internals.
@aryaman2020
Aryaman Arora
8 months
obvious applications of interpretability are steering and monitoring (if you can get those to work that is). another application area i haven't seen much in is evals — we could eval whether models produce correct answers for the right internal reasons?
1
2
41
@mengk20
Kevin Meng
8 months
feel free to inspect the samples in Docent - we've uploaded them for the public to see. first example: https://t.co/jZCIK2srC2 second example: https://t.co/etIz0Lor1G third example: https://t.co/UMvyyTUeA1
1
1
35
@mengk20
Kevin Meng
8 months
to be clear, "Claude memorized the solution" doesn't mean "Claude can't do the task." it *does* mean we're not thinking about model capabilities in the right way. an undergraduate would never act like Claude did
1
3
65