HarryCoppock Profile Banner
Harry Coppock Profile
Harry Coppock

@HarryCoppock

Followers
197
Following
1K
Media
18
Statuses
195

No. 10 Downing Street Innovation Fellow | Research Scientist at AISI | Visiting Lecturer at Imperial College London Working on AI Evaluation and AI for Medicine

London
Joined March 2021
Don't wanna be here? Send us removal request.
@CUdudec
Cozmin Ududec
20 days
Very excited that this systematic analysis is out! We found a bunch of failure modes, as well as interesting and surprising behaviours. Theres a lot more insight we can get from looking carefully at how models are solving evaluation tasks!
@AISecurityInst
AI Security Institute
20 days
Measuring how often an AI agent succeeds at a task can help us assess its capabilities – but it doesn’t tell the whole story. We’ve been experimenting with transcript analysis to better understand not just how often agents succeed, but why they fail 🧵
1
2
3
@AnthropicAI
Anthropic
21 days
New research with the UK @AISecurityInst and the @turinginst: We found that just a few malicious documents can produce vulnerabilities in an LLM—regardless of the size of the model or its training data. Data-poisoning attacks might be more practical than previously believed.
88
245
2K
@_robertkirk
Robert Kirk
1 month
We at @AISecurityInst recently did our first pre-deployment 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 evaluation of @AnthropicAI's Claude Sonnet 4.5! This was a first attempt – and we plan to work on this more! – but we still found some interesting results, and some learnings for next time 🧵
3
12
49
@alxndrdavies
Xander Davies
2 months
Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6
8
62
300
@AISecurityInst
AI Security Institute
2 months
🔎 People are increasingly using chatbots to seek out new information, raising concerns about how they could misinform voters or distort public opinion. But how is AI actually influencing real-world political beliefs? Our new study explores this question 👇
2
6
21
@_robertkirk
Robert Kirk
2 months
Since I started working on safeguards, we've seen substantial progress in defending certain hosted models, but less progress in measuring & managing misuse risks from open weight models. Three directions I want explored more, drawn from our @AISecurityInst post today 🧵
1
7
36
@HarryCoppock
Harry Coppock
3 months
This is great news for the UK. Having worked with Jade over the past 2 years, setting up @AISecurityInst, I am confident that there are very few, if any, who are better placed to take on this role.
@matthewclifford
Matt Clifford
3 months
Absolutely delighted about this - major upgrade on the last AI adviser! Jade brings a tonne of experience in frontier labs, VC and government and will do an amazing job of ensuring the UK is an AI winner. Excellent news.
0
0
4
@AISecurityInst
AI Security Institute
3 months
How can open-weight Large Language Models be safeguarded against malicious uses? In our new paper with @AiEleuther, we find that removing harmful data before training can be over 10x more effective at resisting adversarial fine-tuning than defences added after training 🧵
5
43
219
@alxndrdavies
Xander Davies
3 months
We at @AISecurityInst worked with @OpenAI to test GPT-5's safeguards. We identified multiple jailbreaks, including a universal jailbreak that evades all layers of mitigations and is being patched. Excited to continue partnering with OpenAI to test & strengthen safeguards.
17
24
128
@andyzou_jiaming
Andy Zou
3 months
We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵
70
393
2K
@AISecurityInst
AI Security Institute
3 months
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
7
64
191
@alxndrdavies
Xander Davies
4 months
We at @AISecurityInst worked with @OpenAI to test & improve Agent’s safeguards prior to release. A few notes on our experience🧵 1/4
3
29
152
@AISecurityInst
AI Security Institute
4 months
🧵 AI Systems are developing advanced cyber capabilities. This means they’re helping strengthen defences - but can also be used as threats. To keep on top of these risks, we need more rigorous evaluations of agentic AI, which is why we’re releasing Inspect Cyber 🔍
1
13
58
@alxndrdavies
Xander Davies
8 months
My team is hiring @AISecurityInst! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research & testing. 🧵 1/4
4
37
172
@GraySwanAI
Gray Swan AI
8 months
Brace Yourself: Our Biggest AI Jailbreaking Arena Yet We’re launching a next-level Agent Red-Teaming Challenge—not just chatbots anymore. Think direct & indirect attacks on anonymous frontier models. $100K+ in prizes and raffle giveaways supported by UK @AISecurityInst
3
13
48
@alxndrdavies
Xander Davies
1 year
Jailbreaking evals ~always focus on simple chatbots—excited to announce AgentHarm, a dataset for measuring harmfulness of LLM 𝑎𝑔𝑒𝑛𝑡𝑠 developed at @AISafetyInst in collaboration with @GraySwanAI! 🧵 1/N
5
40
190
@AISecurityInst
AI Security Institute
1 year
AISI is co-hosting DEF CON's generative red teaming challenge this year! Huge thanks to @comathematician @aivillage_dc @defcon for making this happen. (1/6)
1
6
29
@alxndrdavies
Xander Davies
1 year
@AISafetyInst will be at @defcon! If you'd like to chat abt attacking, defending, & evaling frontier models, DM me or fill out our form (in 🧵)
1
5
18
@AISecurityInst
AI Security Institute
1 year
We're at #icml2024. If you want to chat about our work or roles, message @herbiebradley (predictive evals) @tomekkorbak (safety cases) @jelennal_ (agents) @CUdudec (testing) @HarryCoppock (cyber evals + AI for med) @oliviagjimenez (recruiting)
1
4
11
@jobiebudd
Jobie Budd
2 years
Is your AI-enabled diagnostic tool accurate, or does your dataset have confounding bias? Our Turing-RSS Health Data Lab paper, published today in Nature Machine Intelligence, investigates audio-based AI classifiers for COVID-19 screening. https://t.co/ZgexfmyKRX
1
4
5