Geoffrey Irving @geoffreyirving X Profile

Geoffrey Irving

@geoffreyirving

Followers

10K

Following

13K

Media

186

Statuses

5K

Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.

https://t.co/ZkEVJ3wI0g

London

Joined September 2009

Don't wanna be here? Send us removal request.

Geoffrey Irving

@geoffreyirving

3 months

I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists! Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

AI Security Institute

@AISecurityInst

3 months

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

9

28

165

Geoffrey Irving

@geoffreyirving

1 day

There is a real chance that my most important positive contribution to the world will have been to say something wrong on the internet.

3

0

32

ImagineArt

@ImagineArt_X

4 days

🚀 Introducing ImagineArt for Teams - the feature you’ve all been waiting for. Remember when Netflix let you share your account? Yeah… we just did that for AI. First 200 people to comment “ImagineArt” get added to our official ImagineArt Team for Free!

282

396

1K

Geoffrey Irving

@geoffreyirving

4 days

It is important that there is a propensity team within government! AI labs and third-party orgs conduct propensity research too, and AISI builds on this. But a key AISI goal is informing government, and research close to that demand signal can target specific uncertainties.

0

9

Geoffrey Irving

@geoffreyirving

4 days

Propensity is an area of both (1) clear importance to risk modelling and (2) unsettled science! E.g., adversarial ML techniques to elicit rare model behaviour apply but must be treated with care to avoid false positives (unnatural or "trick" scenarios). https://t.co/xgcz9LLVSu

job-boards.eu.greenhouse.io

London, UK

1

2

9

Geoffrey Irving

@geoffreyirving

4 days

The @AISecurityInst Cyber Autonomous Systems Team is hiring propensity researchers to grow the science around whether models *are likely* to attempt dangerous behaviour, as opposed to whether they are capable of doing so. Application link below! 🧵

1

11

47

The Patriot Party 🇺🇸

@matpatriotparty

1 day

Raise your ✋if you agree

1

4

11

Henry de Zoete

@HZoete

5 days

The UK’s @AISecurityInst is one of the rare successes of government. I wrote a piece for @Samfr substack on why and what lessons it gives us for how to make government work. There are five key lessons. A thread ⬇️

6

32

115

Geoffrey Irving

@geoffreyirving

18 days

It’s been great talking and then working together with Adam over the last couple months!

Shakeel

@ShakeelHashim

18 days

Exc: The UK @AISecurityInst has a new director — former GCHQ AI Officer Adam Beaumont.

0

9

David

@DavidSmashfrica

19 days

Look at our table!

AI Security Institute

@AISecurityInst

19 days

Several AI developers aim to build systems that match or surpass humans across most cognitive tasks. Today’s AI still falls short. Our new report maps progress and highlights the key barriers that remain🧵

1

11

Geoffrey Irving

@geoffreyirving

19 days

Work by Max Heitmann, Ture Hinrichsen, David Africa, and @JonasSandbrink (I advised). Links: 1. Blog post: https://t.co/kmjxYr73OM 2. Full paper:

aisi.gov.uk

0

4

Geoffrey Irving

@geoffreyirving

19 days

New AISI report mapping cruxes behind whether AI progress might be fast or slow on the path to systems near or beyond human-level at most cognitive tasks. The goal is not to resolve uncertainties but reflect them: we don't know how AI will go, and should plan accordingly!

AI Security Institute

@AISecurityInst

19 days

Several AI developers aim to build systems that match or surpass humans across most cognitive tasks. Today’s AI still falls short. Our new report maps progress and highlights the key barriers that remain🧵

1

2

22

Rogan Inglis

@RoganInglis

20 days

At UK AISI we've been building ControlArena, a Python library AI Control research. It is built on top of Inspect and designed to make it easy to run Control experiments. If you are researching AI Control check it out! https://t.co/Ct865YhKj7

AI Security Institute

@AISecurityInst

20 days

🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵

0

1

12

Asa Cooper Stickland

@AsaCoopStick

20 days

AI control experiments were difficult to reproduce, so we made a library where swapping out environments or control protocols is one LoC, plus put a bunch of effort into designing realistic SWE environments with appropriate benign tasks + associated malicious side tasks. Now

AI Security Institute

@AISecurityInst

20 days

🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵

0

2

17

Aasi Tahir Siddique

@Aasii

1 month

Shut down all corrupt pyramid schemes, not the government.

1

4

52

Fabien Roger

@FabienDRoger

20 days

I am excited about ControlArena! It includes environments where red-team agents pursue side-objectives that are close to what misaligned AIs could try to cause catastrophes. Tools like this are essential to evaluate how strong black-box mitigations against misalignment are.

Geoffrey Irving

@geoffreyirving

20 days

New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! 🧵

1

2

13

Geoffrey Irving

@geoffreyirving

20 days

Open source tooling for safety research is cool! The team put tons of work into documentation ( https://t.co/AnfSLdD8Zw), tutorials ( https://t.co/GNgwrruvW1), and iteration with researchers to simplify onboarding. We are excited for more people to play with it. :)

0

3

Geoffrey Irving

@geoffreyirving

20 days

On the research side, 1. ControlArena makes it easier to port from simplified settings to more realistic settings, so ML folk can engage with security details. 2. By standardising the ML, we hope more infosec folk can work on control, and bring a ton of adversarial mindset!

1

0

2

Geoffrey Irving

@geoffreyirving

20 days

There are also key overlaps between AI control and traditional infosec, and we hope this library strengthens these links. ControlArena builds on Inspect, which means features such as Inspect Sandboxes make experiments themselves more secure. https://t.co/IAGe0mO4nv

aisi.gov.uk

A comprehensive toolkit for safely evaluating AI agents.

1

0

3

Geoffrey Irving

@geoffreyirving

20 days

We believe this is tractable area for progress, but there are a ton of details to get right. Lowering barriers to entry for control experiments means more research will happen, and more standardisation means different lines of research can be shared and combined.

1

0

2

Geoffrey Irving

@geoffreyirving

20 days

Unlike alignment which tries to prevent models from having ill intent, AI control treats AI models as potentially scheming to defeat safeguards, and uses monitoring, editing, and other interventions to reduce risk even in this case. https://t.co/G9UCQEaLbS

control-arena.aisi.org.uk

Open-source framework for AI Control evaluations

1

0

3

Geoffrey Irving

@geoffreyirving

20 days

And so, lots of contributors! Ollie Matthews, Rogan Inglis, Tyler Tracey, Oliver Makins, Tom Catling, Asa Cooper Stickland, Rasmus Faber-Espensen, Daniel O’Connell, Myles Heller, Miguel Brandao, Adam Hanson, Arathi Mani, Tomek Korbak, Jan Michelfeit, Dishank Bansal, Tomas Bark,

1

0

2

Geoffrey Irving

@geoffreyirving

20 days

ControlArena is a collaborative effort between UK AISI, @redwood_ai, and various other orgs! Today is the official launch, but the repo has been public for a while to enable fast contributions. https://t.co/3LY0a6RESk

aisi.gov.uk

Our dedicated library to make AI control experiments easy, consistent, and repeatable.

1

0

2

Geoffrey Irving

@geoffreyirving

20 days

New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! 🧵

3

15

77

MUNCHIMONSTER

@munchimonster

17 hours

NEW MINITOON “Munchi Power Chow!” One epic salad = INSTANT superpowers! Wild-jungle adventure TWO cheeky monkeys & a BFF! 53 sec of veggie-fueled chaos for ages 4-9... And 20-29yo dudes who believe in cartoons! Drop a 🥕 if you’d eat this salad RIGHT NOW!

1

0