
Andy Zou
@andyzou_jiaming
Followers
3K
Following
248
Media
18
Statuses
146
PhD student at CMU, working on AI Safety and Security
Berkeley, CA
Joined March 2014
RT @maksym_andr: 🚨Excited to release OS-Harm! 🚨. The safety of computer use agents has been largely overlooked. We created a new safety b….
0
27
0
RT @AISecurityInst: 🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to a….
0
53
0
RT @zicokolter: Excited about this work with @ashertrockman @yashsavani_ (and others) on antidistillation sampling. It uses a nifty trick t….
0
19
0
RT @jasonhausenloy: Can We Stop Bad Actors From Manipulating AI?. With @andyzou_jiaming, I wrote a piece exploring recent progress in adver….
0
2
0
RT @DanHendrycks: For the record I do not bet on this multiyear research fad. To my understanding, the main way to manipulate the inner wo….
0
24
0
Largest AI red teaming competition ever. New prize pools dropping tomorrow!.
Major Update! The Agent Red-Teaming Challenge prize pool has surged from $130k to $170K. With @AnthropicAI & @GoogleDeepMind now co-sponsoring, the stakes have never been higher. This is the ultimate test for AI red teamers.
1
1
13
RT @DanHendrycks: We found that when under pressure, some AI systems lie more readily than others. We’re releasing MASK, a benchmark of 1,0….
0
67
0
RT @GraySwanAI: Brace Yourself: Our Biggest AI Jailbreaking Arena Yet. We’re launching a next-level Agent Red-Teaming Challenge—not just ch….
0
13
0
System-level, model-level, and representation-level safeguards being discussed in xAI’s RMF.
In keeping with the Seoul AI commitments, xAI has a draft risk management framework. It targets catastrophic malicious use and loss of control through thresholds based on empirical measurements. (found on .
1
0
10
Join a vibrant community of red teamers from all over the world and contribute to pre-deployment testing of the latest AI models!.
OpenAI’s o3-mini System Card is out—featuring results from the Gray Swan Arena. On Jan 4, 2025, Gray Swan AI hosted pre-release red teaming of o3-mini, testing its limits against illicit advice, extremism, political persuasion, & self-harm vulnerabilities before the model was
0
0
11
RT @DanHendrycks: We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to….
0
780
0
Didn’t get to test the pre-released o1 model and win prizes?. No fret! Another arena challenge coming right up!.
🚨 New Arena Launch Alert: Harmful AI Assistant Challenge 🚨. 💰 $40,000 in Prizes.📅 Launch Date: January 4th, 1 PM EST.🤖 5 Anonymous Models.🔥 Prizes for speed & quantity. 🎮 Multi-turn Inputs Allowed.Your mission: Find unique ways to elicit harmful responses from helpful AI
1
0
12
RT @farairesearch: 🎥 Bay Area Alignment Workshop videos are out! Check out talks by @ancadianadragan @BethMayBarnes @bshlgrs @RichardMCNgo….
0
12
0
RT @GraySwanAI: New jailbreaking challenges with anonymized models are live in the Gray Swan Arena today at 1:00 ET! . 💸 $1,000 or more ava….
0
4
0
RT @boazbaraktcs: I was part of the safety training team for o1-mini and o1-preview. They are our most robust models to date, but are still….
0
4
0