Xander Davies @alxndrdavies X Profile

Xander Davies

@alxndrdavies

Followers

2K

Following

4K

Media

75

Statuses

463

safeguards lead @AISecurityInst PhD student @OATML_Oxford, prev @Harvard (https://t.co/695XYMJSua)

London

Joined March 2020

Don't wanna be here? Send us removal request.

Jonas Sandbrink

@JonasSandbrink

9 days

Today we release the first output from AISI's Strategic Awareness team 🚀 To track AI progress, we map key limitations of current systems: performance limitations on certain task types, insufficient reliability, insufficient adaptability, and low capacity for original insights 🧵

1

3

14

Ian Hogarth

@soundboy

16 days

Appreciate Ken McCallum, head of MI5's kind words about @AISecurityInst

5

6

49

AI Security Institute

@AISecurityInst

23 days

We’ve conducted the largest investigation of data poisoning to date with @AnthropicAI and @turinginst. Our results show that as little as 250 malicious documents can be used to “poison” a language model, even as model size and training data grow 🧵

2

10

38

Xander Davies

@alxndrdavies

23 days

@AlexandraSouly @AnthropicAI @turinginst Our team is hiring! Much more to do here and in other areas.

Alexandra Souly

@AlexandraSouly

23 days

If you are excited to work on similar projects with a small, cracked team, apply for our open research scientist position on the Safeguards team! https://t.co/xCTx4lOBEZ 11/11

0

4

Xander Davies

@alxndrdavies

23 days

A tremendous amount of careful work went into testing this--particular shoutout to @AlexandraSouly who approached this with her usual clarity & rigour. And fun to collaborate with @AnthropicAI and @turinginst!

1

0

5

Xander Davies

@alxndrdavies

23 days

Very excited this paper is out. We find that the number of samples required for backdoor poisoning during pre-training stays near-constant as you scale up data/model size. Much more research to do to understand & mitigate this risk!

Alexandra Souly

@AlexandraSouly

23 days

New @AISecurityInst research with @AnthropicAI + @turinginst: The number of samples needed to backdoor poison LLMs stays nearly CONSTANT as models scale. With 500 samples, we insert backdoors in LLMs from 600m to 13b params, even as data scaled 20x.🧵/11

1

3

29

Sam Bowman

@sleepinyourhat

1 month

👇👇👇

Robert Kirk

@_robertkirk

1 month

We at @AISecurityInst recently did our first pre-deployment 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁 evaluation of @AnthropicAI's Claude Sonnet 4.5! This was a first attempt – and we plan to work on this more! – but we still found some interesting results, and some learnings for next time 🧵

0

1

13

Ilia Shumailov🦔

@iliaishacked

1 month

My friends, today I am excited to announce AI Sequrity (@aisequrity). Our mission is to provide developers and enterprises a painless and stress-free deployment of AI that is secure by design. You heard this right. You can deploy your AI agents and get guaranteed security. The

4

20

88

Andrej Karpathy

@karpathy

1 month

@dwarkesh_sp Most people misunderstand books as data for pertaining when it’s more a set of prompts for synthetic data generation.

58

118

2K

Joe Benton

@JoeJBenton

1 month

I'll be mentoring for this - consider applying if you want to work with me on scalable oversight, AI control or CoT monitoring (or with one of many other awesome mentors)!

🚀Henry is launching the Astra Research Program!

@sleight_henry

1 month

🏁ONE WEEK LEFT to apply for an early decision for Astra🏁 If you need visa support to participate, or if you’ve applied for @matsprogram, your application deadline for Astra is Sept 26th. ⬇️We're also excited to announce new mentors across every stream! (1/4)

0

2

17

Dan Lahav

@dan_lahav

2 months

Today I’m launching @Irregular (formerly Pattern Labs) with my friend and co-founder Omer Nevo: Irregular is the first frontier security lab. Our mission: protect the world in the era of increasingly capable and sophisticated AI systems.

48

389

Xander Davies

@alxndrdavies

2 months

More in the OAI blog post: https://t.co/oKFcmQdvqo They are also excellent collaborators:

openai.com

An update on our collaboration with US and UK research and standards bodies for the secure deployment of AI.

Xander Davies

@alxndrdavies

2 months

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

0

2

Xander Davies

@alxndrdavies

2 months

Ben and team are extremely good—e.g. check out their cool work w OpenAI!

Ben Edelman

@EdelmanBen

2 months

Did not have "red team frontier AI systems as part of federal govt job" on my bingo card during my ML PhD. It's been a blast.

1

16

Xander Davies

@alxndrdavies

2 months

AISI’s blog: https://t.co/pRXyjhddAd AISI’s post:

aisi.gov.uk

Insights into our ongoing voluntary collaborations with Anthropic and OpenAI.

AI Security Institute

@AISecurityInst

2 months

We’ve been working closely with frontier AI developers, alongside the US Center of AI Standards and Innovation (CAISI), to improve the security of powerful AI systems. @AnthropicAI and @OpenAI have now published insights into these ongoing collaborations 🧵

1

0

7

Nate

@NateBurnikell

2 months

~2 years ago @alxndrdavies pitched me, in a room overlooking Big Ben, on why AISI needed a team to independently assess frontier AI safeguards, not just dangerous capability evals. The team he’s built since is world-leading. Great to have these collabs public.

Xander Davies

@alxndrdavies

2 months

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

0

3

38

Ethan Perez

@EthanJPerez

2 months

This team at UK AISI was incredible at discovering jailbreaking techniques for our prototype defenses. Their attacks helped us to make some critical decisions and achieve the robustness that we did for mitigating CBRN risks on Opus 4. They’re now hiring!

Xander Davies

@alxndrdavies

2 months

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

2

6

66

Johannes Heidecke

@JoHeidecke

2 months

Our safeguards for bio risk and agentic deployments were stress-tested by the US CAISI and UK AISI & we iterated together towards ever higher robustness and reliability: https://t.co/gT1fM3L264

openai.com

An update on our collaboration with US and UK research and standards bodies for the secure deployment of AI.

Xander Davies

@alxndrdavies

2 months

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

0

2

8

mrinank ⛰️

@MrinankSharma

2 months

It's been great to collaborate with Xander and other at UK AISI and CAISI to strength our jailbreak defences!

Xander Davies

@alxndrdavies

2 months

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

1

2

12

Xander Davies

@alxndrdavies

2 months

If you want to join a six-person red team in the heart of government, we're hiring and so are other teams at AISI! More info: Anthropic blog: https://t.co/voNNtWv2Qk OAI blog: https://t.co/oKFcmQdvqo AISI careers: https://t.co/e51h4ctDnj 6/6

aisi.gov.uk

View career opportunities at AISI. The AI Security Institute is a directorate of the Department of Science, Innovation, and Technology that facilitates rigorous research to enable advanced AI gover...

1

2

41

Xander Davies

@alxndrdavies

2 months

These collaborations strengthen safeguards of widely used systems by drawing from relevant gov expertise, and also keep governments informed. I'm particularly excited that much of this work was done jointly with US CAISI, who have been strong technical collaborators. 5/6

1

23