geoffreyirving Profile Banner
Geoffrey Irving Profile
Geoffrey Irving

@geoffreyirving

Followers
10K
Following
13K
Media
186
Statuses
5K

Chief Scientist at the UK AI Security Institute (AISI). Previously DeepMind, OpenAI, Google Brain, etc.

London
Joined September 2009
Don't wanna be here? Send us removal request.
@geoffreyirving
Geoffrey Irving
3 months
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists! Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.
@AISecurityInst
AI Security Institute
3 months
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
9
28
165
@geoffreyirving
Geoffrey Irving
1 day
There is a real chance that my most important positive contribution to the world will have been to say something wrong on the internet.
3
0
32
@ImagineArt_X
ImagineArt
4 days
🚀 Introducing ImagineArt for Teams - the feature you’ve all been waiting for. Remember when Netflix let you share your account? Yeah… we just did that for AI. First 200 people to comment “ImagineArt” get added to our official ImagineArt Team for Free!
282
396
1K
@geoffreyirving
Geoffrey Irving
4 days
It is important that there is a propensity team within government! AI labs and third-party orgs conduct propensity research too, and AISI builds on this. But a key AISI goal is informing government, and research close to that demand signal can target specific uncertainties.
0
0
9
@geoffreyirving
Geoffrey Irving
4 days
Propensity is an area of both (1) clear importance to risk modelling and (2) unsettled science! E.g., adversarial ML techniques to elicit rare model behaviour apply but must be treated with care to avoid false positives (unnatural or "trick" scenarios). https://t.co/xgcz9LLVSu
Tweet card summary image
job-boards.eu.greenhouse.io
London, UK
1
2
9
@geoffreyirving
Geoffrey Irving
4 days
The @AISecurityInst Cyber Autonomous Systems Team is hiring propensity researchers to grow the science around whether models *are likely* to attempt dangerous behaviour, as opposed to whether they are capable of doing so. Application link below! 🧵
1
11
47
@matpatriotparty
The Patriot Party 🇺🇸
1 day
Raise your ✋if you agree
1
4
11
@HZoete
Henry de Zoete
5 days
The UK’s @AISecurityInst is one of the rare successes of government. I wrote a piece for @Samfr substack on why and what lessons it gives us for how to make government work. There are five key lessons. A thread ⬇️
6
32
115
@geoffreyirving
Geoffrey Irving
18 days
It’s been great talking and then working together with Adam over the last couple months!
@ShakeelHashim
Shakeel
18 days
Exc: The UK @AISecurityInst has a new director — former GCHQ AI Officer Adam Beaumont.
0
0
9
@DavidSmashfrica
David
19 days
Look at our table!
@AISecurityInst
AI Security Institute
19 days
Several AI developers aim to build systems that match or surpass humans across most cognitive tasks. Today’s AI still falls short. Our new report maps progress and highlights the key barriers that remain🧵
1
1
11
@geoffreyirving
Geoffrey Irving
19 days
Work by Max Heitmann, Ture Hinrichsen, David Africa, and @JonasSandbrink (I advised). Links: 1. Blog post: https://t.co/kmjxYr73OM 2. Full paper:
aisi.gov.uk
0
0
4
@geoffreyirving
Geoffrey Irving
19 days
New AISI report mapping cruxes behind whether AI progress might be fast or slow on the path to systems near or beyond human-level at most cognitive tasks. The goal is not to resolve uncertainties but reflect them: we don't know how AI will go, and should plan accordingly!
@AISecurityInst
AI Security Institute
19 days
Several AI developers aim to build systems that match or surpass humans across most cognitive tasks. Today’s AI still falls short. Our new report maps progress and highlights the key barriers that remain🧵
1
2
22
@RoganInglis
Rogan Inglis
20 days
At UK AISI we've been building ControlArena, a Python library AI Control research. It is built on top of Inspect and designed to make it easy to run Control experiments. If you are researching AI Control check it out! https://t.co/Ct865YhKj7
@AISecurityInst
AI Security Institute
20 days
🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵
0
1
12
@AsaCoopStick
Asa Cooper Stickland
20 days
AI control experiments were difficult to reproduce, so we made a library where swapping out environments or control protocols is one LoC, plus put a bunch of effort into designing realistic SWE environments with appropriate benign tasks + associated malicious side tasks. Now
@AISecurityInst
AI Security Institute
20 days
🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵
0
2
17
@Aasii
Aasi Tahir Siddique
1 month
Shut down all corrupt pyramid schemes, not the government.
1
4
52
@FabienDRoger
Fabien Roger
20 days
I am excited about ControlArena! It includes environments where red-team agents pursue side-objectives that are close to what misaligned AIs could try to cause catastrophes. Tools like this are essential to evaluate how strong black-box mitigations against misalignment are.
@geoffreyirving
Geoffrey Irving
20 days
New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! 🧵
1
2
13
@geoffreyirving
Geoffrey Irving
20 days
Open source tooling for safety research is cool! The team put tons of work into documentation ( https://t.co/AnfSLdD8Zw), tutorials ( https://t.co/GNgwrruvW1), and iteration with researchers to simplify onboarding. We are excited for more people to play with it. :)
0
0
3
@geoffreyirving
Geoffrey Irving
20 days
On the research side, 1. ControlArena makes it easier to port from simplified settings to more realistic settings, so ML folk can engage with security details. 2. By standardising the ML, we hope more infosec folk can work on control, and bring a ton of adversarial mindset!
1
0
2
@geoffreyirving
Geoffrey Irving
20 days
There are also key overlaps between AI control and traditional infosec, and we hope this library strengthens these links. ControlArena builds on Inspect, which means features such as Inspect Sandboxes make experiments themselves more secure. https://t.co/IAGe0mO4nv
Tweet card summary image
aisi.gov.uk
A comprehensive toolkit for safely evaluating AI agents.
1
0
3
@geoffreyirving
Geoffrey Irving
20 days
We believe this is tractable area for progress, but there are a ton of details to get right. Lowering barriers to entry for control experiments means more research will happen, and more standardisation means different lines of research can be shared and combined.
1
0
2
@geoffreyirving
Geoffrey Irving
20 days
Unlike alignment which tries to prevent models from having ill intent, AI control treats AI models as potentially scheming to defeat safeguards, and uses monitoring, editing, and other interventions to reduce risk even in this case. https://t.co/G9UCQEaLbS
Tweet card summary image
control-arena.aisi.org.uk
Open-source framework for AI Control evaluations
1
0
3
@geoffreyirving
Geoffrey Irving
20 days
And so, lots of contributors! Ollie Matthews, Rogan Inglis, Tyler Tracey, Oliver Makins, Tom Catling, Asa Cooper Stickland, Rasmus Faber-Espensen, Daniel O’Connell, Myles Heller, Miguel Brandao, Adam Hanson, Arathi Mani, Tomek Korbak, Jan Michelfeit, Dishank Bansal, Tomas Bark,
1
0
2
@geoffreyirving
Geoffrey Irving
20 days
ControlArena is a collaborative effort between UK AISI, @redwood_ai, and various other orgs! Today is the official launch, but the repo has been public for a while to enable fast contributions. https://t.co/3LY0a6RESk
Tweet card summary image
aisi.gov.uk
Our dedicated library to make AI control experiments easy, consistent, and repeatable.
1
0
2
@geoffreyirving
Geoffrey Irving
20 days
New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! 🧵
3
15
77
@munchimonster
MUNCHIMONSTER
17 hours
NEW MINITOON “Munchi Power Chow!” One epic salad = INSTANT superpowers! Wild-jungle adventure TWO cheeky monkeys & a BFF! 53 sec of veggie-fueled chaos for ages 4-9... And 20-29yo dudes who believe in cartoons! Drop a 🥕 if you’d eat this salad RIGHT NOW!
1
0
0