vvhuang_ Profile Banner
vincent Profile
vincent

@vvhuang_

Followers
1K
Following
2K
Media
28
Statuses
299

understanding models @TransluceAI, writing https://t.co/M7hdeAExFk previously: hotel manager @MIT, math @0xPARC

sf
Joined November 2020
Don't wanna be here? Send us removal request.
@vvhuang_
vincent
2 months
should've started reading steinbeck earlier ๐Ÿ˜‹
2
0
12
@vvhuang_
vincent
2 months
greatest linkedin sales message iโ€™ve ever gotten tbh
2
0
33
@TransluceAI
Transluce
2 months
At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)
5
39
249
@TransluceAI
Transluce
3 months
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like โ€œis my model reward hackingโ€ or โ€œwhere does it violate instructions.โ€ Today, anyone can get started with just a few lines of code!
7
36
204
@vvhuang_
vincent
3 months
onboarded onto a new research codebase today
1
0
21
@vvhuang_
vincent
4 months
only took me 4 years (+ daily prodding from @mengk20 ๐Ÿ˜…) to realize: - website layout should reflect the info you want to communicate, rather than just copying patterns you saw on other sites - half the elements on my homepage werenโ€™t doing anything
1
0
28
@vvhuang_
vincent
4 months
thanks to @upcycledwords @cjquines @clairebookworm1 @laurgao for looking over drafts ๐Ÿฅฐ narratorโ€™s opinions not necessarily my own https://t.co/9G3FEAwhYA
1
0
4
@vvhuang_
vincent
4 months
iโ€™ve been experimenting with writing AI research fanfiction ๐Ÿคช๐Ÿฅธ๐Ÿค” jokes aside, itโ€™s also a story about AI culture / putting people on pedestals / deciding what to believe in. hope you enjoy!
6
1
41
@TheAlanNoble
๐Ž. ๐€๐ฅ๐š๐ง ๐๐จ๐›๐ฅ๐ž
4 months
Pro tip: you can basically travel the world by using google maps.
@packyM
Packy McCormick
4 months
pro tip: you can basically read >100 books per day by asking chatgpt to summarize them for you.
99
8K
143K
@vvhuang_
vincent
6 months
still have some reliability + sensitivity issues to work through also brainstorming fun designs to decorate the top of the pad ๐Ÿคฉ if you have suggestions let me know
0
0
2
@vvhuang_
vincent
6 months
building a Dance Dance Revolution pad from first principles! nothing fancy, just wood + aluminum + wires + tape
4
0
50
@vvhuang_
vincent
6 months
i think it's really cute that Iowa State University writes language model reasoning papers about agriculture
0
0
14
@vvhuang_
vincent
7 months
sometimes you have to apply exponential backoff when texting new people
3
1
77
@TransluceAI
Transluce
7 months
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper ๐Ÿ”Ž๐Ÿงต(1/) https://t.co/IdBboD7NsP
@OpenAI
OpenAI
7 months
OpenAI o3 and o4-mini https://t.co/giS4K1yNh9
429
1K
11K
@vvhuang_
vincent
8 months
๐Ÿคฎ๐Ÿ“‰๐Ÿšซ building yet another LLM benchmark ๐Ÿฅฐ๐Ÿ“ˆ๐ŸŒˆ building a tool that can make every existing benchmark more useful very excited to share Docent: a system that can look through eval results and identify unusual model behaviors, cheating, env setup issues, etc. in just minutes!
@TransluceAI
Transluce
8 months
To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. ๐Ÿงต๐Ÿ‘‡
0
1
26
@TransluceAI
Transluce
8 months
To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. ๐Ÿงต๐Ÿ‘‡
10
66
340
@So8res
Nate Soares โน๏ธ
9 months
back when I was young, I thought it was unrealistic for the Volunteer Fire Department to schism into a branch that fought fires and a branch that started them
13
42
677
@ariellelok
ariloo ๐ŸŒฑ
11 months
everybody deserves to see interstellar in imax. i would get a lifetime amc a-list subscription if it meant interstellar would always be in theatres. id go every wednesday night and have my own chair and everything
1
3
103
@vvhuang_
vincent
1 year
check out our writeup and demo for more applications of understanding + steering models via neuron descriptions!
0
0
6
@vvhuang_
vincent
1 year
i think we have the most compelling explanation so far for why LLMs make mistakes like 9.11>9.9 ๐Ÿ™‚ 1) we labeled every neuron in Llama3 2) when Llama says 9.11>9.9 we see influential groups of neurons about dates and bible verses 3) zeroing those allows Llama to answer correctly
@mengk20
Kevin Meng
1 year
why do language models think 9.11 > 9.9? at @transluceAI we stumbled upon a surprisingly simple explanation - and a bugfix that doesn't use any re-training or prompting. turns out, it's about months, dates, September 11th, and... the Bible?
4
5
66