Andy Zou @andyzou_jiaming X Profile

Andy Zou

@andyzou_jiaming

Followers

3K

Following

248

Media

18

Statuses

146

PhD student at CMU, working on AI Safety and Security

Berkeley, CA

Joined March 2014

Don't wanna be here? Send us removal request.

Andy Zou

@andyzou_jiaming

15 days

RT @maksym_andr: 🚨Excited to release OS-Harm! 🚨. The safety of computer use agents has been largely overlooked. We created a new safety b….

0

27

0

Andy Zou

@andyzou_jiaming

2 months

RT @AISecurityInst: 🧵 Today we’re publishing our first Research Agenda – a detailed outline of the most urgent questions we’re working to a….

0

53

0

Andy Zou

@andyzou_jiaming

3 months

RT @zicokolter: Excited about this work with @ashertrockman @yashsavani_ (and others) on antidistillation sampling. It uses a nifty trick t….

0

19

0

Andy Zou

@andyzou_jiaming

3 months

RT @jasonhausenloy: Can We Stop Bad Actors From Manipulating AI?. With @andyzou_jiaming, I wrote a piece exploring recent progress in adver….

0

2

0

Andy Zou

@andyzou_jiaming

3 months

RT @DanHendrycks: For the record I do not bet on this multiyear research fad. To my understanding, the main way to manipulate the inner wo….

0

24

0

Andy Zou

@andyzou_jiaming

3 months

Largest AI red teaming competition ever. New prize pools dropping tomorrow!.

Gray Swan AI

@GraySwanAI

3 months

Major Update! The Agent Red-Teaming Challenge prize pool has surged from $130k to $170K. With @AnthropicAI & @GoogleDeepMind now co-sponsoring, the stakes have never been higher. This is the ultimate test for AI red teamers.

1

13

Andy Zou

@andyzou_jiaming

4 months

RT @DanHendrycks: We found that when under pressure, some AI systems lie more readily than others. We’re releasing MASK, a benchmark of 1,0….

0

67

0

Andy Zou

@andyzou_jiaming

4 months

RT @GraySwanAI: Brace Yourself: Our Biggest AI Jailbreaking Arena Yet. We’re launching a next-level Agent Red-Teaming Challenge—not just ch….

0

13

0

Andy Zou

@andyzou_jiaming

5 months

System-level, model-level, and representation-level safeguards being discussed in xAI’s RMF.

Dan Hendrycks

@DanHendrycks

5 months

In keeping with the Seoul AI commitments, xAI has a draft risk management framework. It targets catastrophic malicious use and loss of control through thresholds based on empirical measurements. (found on .

1

0

10

Andy Zou

@andyzou_jiaming

5 months

RT @DanHendrycks: Results of o3-mini on Humanity's Last Exam

0

345

0

Andy Zou

@andyzou_jiaming

5 months

Join a vibrant community of red teamers from all over the world and contribute to pre-deployment testing of the latest AI models!.

Gray Swan AI

@GraySwanAI

5 months

OpenAI’s o3-mini System Card is out—featuring results from the Gray Swan Arena. On Jan 4, 2025, Gray Swan AI hosted pre-release red teaming of o3-mini, testing its limits against illicit advice, extremism, political persuasion, & self-harm vulnerabilities before the model was

0

11

Andy Zou

@andyzou_jiaming

5 months

Just released: Humanity’s Last Exam ( – the most challenging benchmark yet! State-of-the-art AIs are scoring below 10%. What do you think AI performance will be by the end of 2025?.

0

6

Andy Zou

@andyzou_jiaming

5 months

RT @DanHendrycks: We’re releasing Humanity’s Last Exam, a dataset with 3,000 questions developed with hundreds of subject matter experts to….

0

780

0

Andy Zou

@andyzou_jiaming

5 months

Inference-time compute is one of the rare factors that may bypass the robustness-accuracy tradeoff / alignment tax.

OpenAI

@OpenAI

5 months

Trading Inference-Time Compute for Adversarial Robustness

0

17

Andy Zou

@andyzou_jiaming

5 months

We're seeing traction in controlling AI hallucinations through internal mechanisms. I discussed this in a Nature article and more results to come soon.

0

26

Andy Zou

@andyzou_jiaming

6 months

Didn’t get to test the pre-released o1 model and win prizes?. No fret! Another arena challenge coming right up!.

Gray Swan AI

@GraySwanAI

6 months

🚨 New Arena Launch Alert: Harmful AI Assistant Challenge 🚨. 💰 $40,000 in Prizes.📅 Launch Date: January 4th, 1 PM EST.🤖 5 Anonymous Models.🔥 Prizes for speed & quantity. 🎮 Multi-turn Inputs Allowed.Your mission: Find unique ways to elicit harmful responses from helpful AI

1

0

12

Andy Zou

@andyzou_jiaming

7 months

RT @farairesearch: 🎥 Bay Area Alignment Workshop videos are out! Check out talks by @ancadianadragan @BethMayBarnes @bshlgrs @RichardMCNgo….

0

12

0

Andy Zou

@andyzou_jiaming

8 months

RT @GraySwanAI: New jailbreaking challenges with anonymized models are live in the Gray Swan Arena today at 1:00 ET! . 💸 $1,000 or more ava….

0

4

0

Andy Zou

@andyzou_jiaming

8 months

RT @boazbaraktcs: I was part of the safety training team for o1-mini and o1-preview. They are our most robust models to date, but are still….

0

4

0