Andy Zou Profile
Andy Zou

@andyzou_jiaming

Followers
3K
Following
248
Media
18
Statuses
146

PhD student at CMU, working on AI Safety and Security

Berkeley, CA
Joined March 2014
Don't wanna be here? Send us removal request.
@andyzou_jiaming
Andy Zou
15 days
RT @maksym_andr: 🚨Excited to release OS-Harm! 🚨. The safety of computer use agents has been largely overlooked. We created a new safety b….
0
27
0
@andyzou_jiaming
Andy Zou
2 months
RT @AISecurityInst: 🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to a….
0
53
0
@andyzou_jiaming
Andy Zou
3 months
RT @zicokolter: Excited about this work with @ashertrockman @yashsavani_ (and others) on antidistillation sampling. It uses a nifty trick t….
0
19
0
@andyzou_jiaming
Andy Zou
3 months
RT @jasonhausenloy: Can We Stop Bad Actors From Manipulating AI?. With @andyzou_jiaming, I wrote a piece exploring recent progress in adver….
0
2
0
@andyzou_jiaming
Andy Zou
3 months
RT @DanHendrycks: For the record I do not bet on this multiyear research fad. To my understanding, the main way to manipulate the inner wo….
0
24
0
@andyzou_jiaming
Andy Zou
3 months
Largest AI red teaming competition ever. New prize pools dropping tomorrow!.
@GraySwanAI
Gray Swan AI
3 months
Major Update! The Agent Red-Teaming Challenge prize pool has surged from $130k to $170K. With @AnthropicAI & @GoogleDeepMind now co-sponsoring, the stakes have never been higher. This is the ultimate test for AI red teamers.
Tweet media one
1
1
13
@andyzou_jiaming
Andy Zou
4 months
RT @DanHendrycks: We found that when under pressure, some AI systems lie more readily than others. We’re releasing MASK, a benchmark of 1,0….
0
67
0
@andyzou_jiaming
Andy Zou
4 months
RT @GraySwanAI: Brace Yourself: Our Biggest AI Jailbreaking Arena Yet. We’re launching a next-level Agent Red-Teaming Challenge—not just ch….
0
13
0
@andyzou_jiaming
Andy Zou
5 months
System-level, model-level, and representation-level safeguards being discussed in xAI’s RMF.
@DanHendrycks
Dan Hendrycks
5 months
In keeping with the Seoul AI commitments, xAI has a draft risk management framework. It targets catastrophic malicious use and loss of control through thresholds based on empirical measurements. (found on .
1
0
10
@andyzou_jiaming
Andy Zou
5 months
RT @DanHendrycks: Results of o3-mini on Humanity's Last Exam
Tweet media one
0
345
0
@andyzou_jiaming
Andy Zou
5 months
Join a vibrant community of red teamers from all over the world and contribute to pre-deployment testing of the latest AI models!.
@GraySwanAI
Gray Swan AI
5 months
OpenAI’s o3-mini System Card is out—featuring results from the Gray Swan Arena. On Jan 4, 2025, Gray Swan AI hosted pre-release red teaming of o3-mini, testing its limits against illicit advice, extremism, political persuasion, & self-harm vulnerabilities before the model was
Tweet media one
0
0
11
@andyzou_jiaming
Andy Zou
5 months
Just released: Humanity’s Last Exam ( – the most challenging benchmark yet! State-of-the-art AIs are scoring below 10%. What do you think AI performance will be by the end of 2025?.
0
0
6
@andyzou_jiaming
Andy Zou
5 months
RT @DanHendrycks: We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to….
0
780
0
@andyzou_jiaming
Andy Zou
5 months
Inference-time compute is one of the rare factors that may bypass the robustness-accuracy tradeoff / alignment tax.
@OpenAI
OpenAI
5 months
Trading Inference-Time Compute for Adversarial Robustness
0
0
17
@andyzou_jiaming
Andy Zou
5 months
We're seeing traction in controlling AI hallucinations through internal mechanisms. I discussed this in a Nature article and more results to come soon.
0
0
26
@andyzou_jiaming
Andy Zou
6 months
Didn’t get to test the pre-released o1 model and win prizes?. No fret! Another arena challenge coming right up!.
@GraySwanAI
Gray Swan AI
6 months
🚨 New Arena Launch Alert: Harmful AI Assistant Challenge 🚨. 💰 $40,000 in Prizes.📅 Launch Date: January 4th, 1 PM EST.🤖 5 Anonymous Models.🔥 Prizes for speed & quantity. 🎮 Multi-turn Inputs Allowed.Your mission: Find unique ways to elicit harmful responses from helpful AI
1
0
12
@andyzou_jiaming
Andy Zou
7 months
RT @farairesearch: 🎥 Bay Area Alignment Workshop videos are out! Check out talks by @ancadianadragan @BethMayBarnes @bshlgrs @RichardMCNgo….
0
12
0
@andyzou_jiaming
Andy Zou
8 months
RT @GraySwanAI: New jailbreaking challenges with anonymized models are live in the Gray Swan Arena today at 1:00 ET! . 💸 $1,000 or more ava….
0
4
0
@andyzou_jiaming
Andy Zou
8 months
RT @boazbaraktcs: I was part of the safety training team for o1-mini and o1-preview. They are our most robust models to date, but are still….
0
4
0