vincent @vvhuang_ X Profile

vincent

@vvhuang_

Followers

1K

Following

2K

Media

28

Statuses

299

understanding models @TransluceAI, writing https://t.co/M7hdeAExFk previously: hotel manager @MIT, math @0xPARC

https://t.co/XEQGy5YQf9

sf

Joined November 2020

Don't wanna be here? Send us removal request.

vincent

@vvhuang_

2 months

should've started reading steinbeck earlier 😋

2

0

12

vincent

@vvhuang_

2 months

greatest linkedin sales message i’ve ever gotten tbh

2

0

33

Transluce

@TransluceAI

2 months

At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)

5

39

249

Transluce

@TransluceAI

3 months

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

7

36

204

vincent

@vvhuang_

3 months

onboarded onto a new research codebase today

1

0

21

vincent

@vvhuang_

4 months

only took me 4 years (+ daily prodding from @mengk20 😅) to realize: - website layout should reflect the info you want to communicate, rather than just copying patterns you saw on other sites - half the elements on my homepage weren’t doing anything

1

0

28

vincent

@vvhuang_

4 months

thanks to @upcycledwords @cjquines @clairebookworm1 @laurgao for looking over drafts 🥰 narrator’s opinions not necessarily my own https://t.co/9G3FEAwhYA

1

0

4

vincent

@vvhuang_

4 months

i’ve been experimenting with writing AI research fanfiction 🤪🥸🤔 jokes aside, it’s also a story about AI culture / putting people on pedestals / deciding what to believe in. hope you enjoy!

6

1

41

𝐎. 𝐀𝐥𝐚𝐧 𝐍𝐨𝐛𝐥𝐞

@TheAlanNoble

4 months

Pro tip: you can basically travel the world by using google maps.

Packy McCormick

@packyM

4 months

pro tip: you can basically read >100 books per day by asking chatgpt to summarize them for you.

99

8K

143K

vincent

@vvhuang_

6 months

still have some reliability + sensitivity issues to work through also brainstorming fun designs to decorate the top of the pad 🤩 if you have suggestions let me know

0

2

vincent

@vvhuang_

6 months

building a Dance Dance Revolution pad from first principles! nothing fancy, just wood + aluminum + wires + tape

4

0

50

vincent

@vvhuang_

6 months

i think it's really cute that Iowa State University writes language model reasoning papers about agriculture

0

14

vincent

@vvhuang_

7 months

sometimes you have to apply exponential backoff when texting new people

3

1

77

Transluce

@TransluceAI

7 months

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) https://t.co/IdBboD7NsP

OpenAI

@OpenAI

7 months

OpenAI o3 and o4-mini https://t.co/giS4K1yNh9

429

1K

11K

vincent

@vvhuang_

8 months

🤮📉🚫 building yet another LLM benchmark 🥰📈🌈 building a tool that can make every existing benchmark more useful very excited to share Docent: a system that can look through eval results and identify unusual model behaviors, cheating, env setup issues, etc. in just minutes!

Transluce

@TransluceAI

8 months

To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇

0

1

26

Transluce

@TransluceAI

8 months

To interpret AI benchmarks, we need to look at the data. Top-level numbers don't mean what you think: there may be broken tasks, unexpected behaviors, or near-misses. We're introducing Docent to accelerate analysis of AI agent transcripts. It can spot surprises in seconds. 🧵👇

10

66

340

Nate Soares ⏹️

@So8res

9 months

back when I was young, I thought it was unrealistic for the Volunteer Fire Department to schism into a branch that fought fires and a branch that started them

13

42

677

ariloo 🌱

@ariellelok

11 months

everybody deserves to see interstellar in imax. i would get a lifetime amc a-list subscription if it meant interstellar would always be in theatres. id go every wednesday night and have my own chair and everything

1

3

103

vincent

@vvhuang_

1 year

check out our writeup and demo for more applications of understanding + steering models via neuron descriptions!

0

6

vincent

@vvhuang_

1 year

i think we have the most compelling explanation so far for why LLMs make mistakes like 9.11>9.9 🙂 1) we labeled every neuron in Llama3 2) when Llama says 9.11>9.9 we see influential groups of neurons about dates and bible verses 3) zeroing those allows Llama to answer correctly

Kevin Meng

@mengk20

1 year

why do language models think 9.11 > 9.9? at @transluceAI we stumbled upon a surprisingly simple explanation - and a bugfix that doesn't use any re-training or prompting. turns out, it's about months, dates, September 11th, and... the Bible?

4

5

66