Harry Coppock @HarryCoppock X Profile

Harry Coppock

@HarryCoppock

Followers

197

Following

1K

Media

18

Statuses

195

No. 10 Downing Street Innovation Fellow | Research Scientist at AISI | Visiting Lecturer at Imperial College London Working on AI Evaluation and AI for Medicine

London

Joined March 2021

Don't wanna be here? Send us removal request.

Cozmin Ududec

@CUdudec

20 days

Very excited that this systematic analysis is out! We found a bunch of failure modes, as well as interesting and surprising behaviours. Theres a lot more insight we can get from looking carefully at how models are solving evaluation tasks!

AI Security Institute

@AISecurityInst

20 days

Measuring how often an AI agent succeeds at a task can help us assess its capabilities – but it doesn’t tell the whole story. We’ve been experimenting with transcript analysis to better understand not just how often agents succeed, but why they fail 🧵

1

2

3

Anthropic

@AnthropicAI

21 days

New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.

88

245

2K

Robert Kirk

@_robertkirk

1 month

We at @AISecurityInst recently did our first pre-deployment 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 evaluation of @AnthropicAI's Claude Sonnet 4.5! This was a first attempt – and we plan to work on this more! – but we still found some interesting results, and some learnings for next time 🧵

3

12

49

Xander Davies

@alxndrdavies

2 months

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

8

62

300

AI Security Institute

@AISecurityInst

2 months

🔎 People are increasingly using chatbots to seek out new information, raising concerns about how they could misinform voters or distort public opinion. But how is AI actually influencing real-world political beliefs? Our new study explores this question 👇

2

6

21

Robert Kirk

@_robertkirk

2 months

Since I started working on safeguards, we've seen substantial progress in defending certain hosted models, but less progress in measuring & managing misuse risks from open weight models. Three directions I want explored more, drawn from our @AISecurityInst post today 🧵

1

7

36

Harry Coppock

@HarryCoppock

3 months

This is great news for the UK. Having worked with Jade over the past 2 years, setting up @AISecurityInst, I am confident that there are very few, if any, who are better placed to take on this role.

Matt Clifford

@matthewclifford

3 months

Absolutely delighted about this - major upgrade on the last AI adviser! Jade brings a tonne of experience in frontier labs, VC and government and will do an amazing job of ensuring the UK is an AI winner. Excellent news.

0

4

AI Security Institute

@AISecurityInst

3 months

How can open-weight Large Language Models be safeguarded against malicious uses? In our new paper with @AiEleuther, we find that removing harmful data before training can be over 10x more effective at resisting adversarial fine-tuning than defences added after training 🧵

5

43

219

Xander Davies

@alxndrdavies

3 months

We at @AISecurityInst worked with @OpenAI to test GPT-5's safeguards. We identified multiple jailbreaks, including a universal jailbreak that evades all layers of mitigations and is being patched. Excited to continue partnering with OpenAI to test & strengthen safeguards.

17

24

128

Andy Zou

@andyzou_jiaming

3 months

We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵

70

393

2K

AI Security Institute

@AISecurityInst

3 months

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

7

64

191

Xander Davies

@alxndrdavies

4 months

We at @AISecurityInst worked with @OpenAI to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4

3

29

152

AI Security Institute

@AISecurityInst

4 months

🧵 AI Systems are developing advanced cyber capabilities. This means they’re helping strengthen defences - but can also be used as threats. To keep on top of these risks, we need more rigorous evaluations of agentic AI, which is why we’re releasing Inspect Cyber 🔍

1

13

58

Xander Davies

@alxndrdavies

8 months

My team is hiring @AISecurityInst! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research & testing. 🧵 1/4

4

37

172

Gray Swan AI

@GraySwanAI

8 months

Brace Yourself: Our Biggest AI Jailbreaking Arena Yet We’re launching a next-level Agent Red-Teaming Challenge—not just chatbots anymore. Think direct & indirect attacks on anonymous frontier models. $100K+ in prizes and raffle giveaways supported by UK @AISecurityInst

3

13

48

Xander Davies

@alxndrdavies

1 year

Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of LLM 𝑎𝑔𝑒𝑛𝑡𝑠 developed at @AISafetyInst in collaboration with @GraySwanAI! 🧵 1/N

5

40

190

AI Security Institute

@AISecurityInst

1 year

AISI is co-hosting DEF CON's generative red teaming challenge this year! Huge thanks to @comathematician @aivillage_dc @defcon for making this happen. (1/6)

1

6

29

Xander Davies

@alxndrdavies

1 year

@AISafetyInst will be at @defcon! If you'd like to chat abt attacking, defending, & evaling frontier models, DM me or fill out our form (in 🧵)

1

5

18

AI Security Institute

@AISecurityInst

1 year

We're at #icml2024. If you want to chat about our work or roles, message @herbiebradley (predictive evals) @tomekkorbak (safety cases) @jelennal_ (agents) @CUdudec (testing) @HarryCoppock (cyber evals + AI for med) @oliviagjimenez (recruiting)

1

4

11

Jobie Budd

@jobiebudd

2 years

Is your AI-enabled diagnostic tool accurate, or does your dataset have confounding bias? Our Turing-RSS Health Data Lab paper, published today in Nature Machine Intelligence, investigates audio-based AI classifiers for COVID-19 screening. https://t.co/ZgexfmyKRX

1

4

5