Rogan Inglis
@RoganInglis
Followers
77
Following
271
Media
1
Statuses
49
Senior Research Engineer, Control at AI Security Institute
London, UK
Joined May 2012
I am excited about ControlArena! It includes environments where red-team agents pursue side-objectives that are close to what misaligned AIs could try to cause catastrophes. Tools like this are essential to evaluate how strong black-box mitigations against misalignment are.
New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! đź§µ
1
2
13
I use controlarena all the time. I think it is a great way to get into control research!
🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵
0
1
8
New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! đź§µ
3
15
77
AI control experiments were difficult to reproduce, so we made a library where swapping out environments or control protocols is one LoC, plus put a bunch of effort into designing realistic SWE environments with appropriate benign tasks + associated malicious side tasks. Now
🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵
0
2
17
At UK AISI we've been building ControlArena, a Python library AI Control research. It is built on top of Inspect and designed to make it easy to run Control experiments. If you are researching AI Control check it out! https://t.co/Ct865YhKj7
🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵
0
1
12
The main public benchmark for AI Control is the Control Arena by @AISecurityInst. It now supports our code backdoor dataset from the Control Tax paper. The dataset contains ~300k lines of tested LLM-generated code, you should try it out!
1
5
19
A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:
38
114
451
New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see đź§µ), even when the AIs involved have similar available compute.
8
56
343
Cool paper from my (old) team at AISI showing the capability level of current frontier models at autonomous replication!
🚨 New AISI research 🚨 RepliBench is a novel benchmark that measures the ability of frontier AI systems to autonomously replicate. Read the full blog here: https://t.co/ewFJTytscV
0
0
1
LLM agents might cause serious harm if they start pursuing misaligned goals. In our new paper, we show how to use capability evals in helping determine which control measures (e.g. monitoring) are sufficient to ensure that an agent can be deployed safely.
5
28
122
"How, exactly, could AI take over by 2027?" Introducing AI 2027: a deeply-researched scenario forecast I wrote alongside @slatestarcodex, @eli_lifland, and @thlarsen
410
1K
5K
Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.) AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.
AGI could revolutionize many fields - from healthcare to education - but it's crucial that it’s developed responsibly. Today, we’re sharing how we’re thinking about safety and security on the path to AGI. → https://t.co/tS21BCo8Er
14
71
362
The solutions group at AISI is doing important work and we may not have a lot of time to do it given the rate of AI progress. Working here is great fun! This blog post gives a good overview of our work. If this interests you I highly recommend applying!
AI systems are advancing fast - but safety measures aren’t keeping up. We're actively working to close this gap with technical research to improve AI mitigations and solutions. Our latest blog outlines our approach⬇️
0
0
1
My team is hiring @AISecurityInst! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research & testing. đź§µ 1/4
4
37
172
🚨 Introducing the AISI Challenge Fund: £5 million to advance AI security & safety research. Grants of up to £200,000 are available for innovative AI research on technical mitigations, improved evaluations, and stronger safeguards. 🛡️🤖
7
25
98
📷 Announcing ControlConf: The world’s first conference dedicated to AI control - techniques to mitigate security risks from AI systems even if they’re trying to subvert those controls. March 27-28, 2025 in London. 🧵
7
38
268
We're starting two new mitigation teams at AISI, Alignment and Control, which together with Safeguards will form a solutions unit working on direct research, collaboration, and external funding for frontier AI mitigations. Here is a thread on why you should join! đź§µ
9
43
216
I’m leading a new team at AISI focused on control empirics. We’re hiring research engineers and research scientists, and you should join us!
8
33
247
Misuse safeguards play an important role in making AI safe - but evaluating how well they work is still an emerging field. Our latest work offers recommendations to accelerate progress and introduces a lightweight template for more effective evaluations. https://t.co/PbP7At0ZtC
0
9
33
Our recent paper found Claude sometimes "fakes alignment"—pretending to comply with training while secretly maintaining its preferences. Could we detect this by offering Claude something (e.g. real money) if it reveals its true preferences? Here's what we found 🧵
44
138
1K