Rogan Inglis @RoganInglis X Profile

Rogan Inglis

@RoganInglis

Followers

77

Following

271

Media

1

Statuses

49

Senior Research Engineer, Control at AI Security Institute

London, UK

Joined May 2012

Don't wanna be here? Send us removal request.

Fabien Roger

@FabienDRoger

20 days

I am excited about ControlArena! It includes environments where red-team agents pursue side-objectives that are close to what misaligned AIs could try to cause catastrophes. Tools like this are essential to evaluate how strong black-box mitigations against misalignment are.

Geoffrey Irving

@geoffreyirving

20 days

New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! 🧵

1

2

13

Tyler Tracy

@tylertracy321

20 days

I use controlarena all the time. I think it is a great way to get into control research!

AI Security Institute

@AISecurityInst

20 days

🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵

0

1

8

Geoffrey Irving

@geoffreyirving

20 days

New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! 🧵

3

15

77

Asa Cooper Stickland

@AsaCoopStick

20 days

AI control experiments were difficult to reproduce, so we made a library where swapping out environments or control protocols is one LoC, plus put a bunch of effort into designing realistic SWE environments with appropriate benign tasks + associated malicious side tasks. Now

AI Security Institute

@AISecurityInst

20 days

🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵

0

2

17

Rogan Inglis

@RoganInglis

20 days

At UK AISI we've been building ControlArena, a Python library AI Control research. It is built on top of Inspect and designed to make it easy to run Control experiments. If you are researching AI Control check it out! https://t.co/Ct865YhKj7

AI Security Institute

@AISecurityInst

20 days

🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵

0

1

12

Mikhail Terekhov

@MiTerekhov

2 months

The main public benchmark for AI Control is the Control Arena by @AISecurityInst. It now supports our code backdoor dataset from the Control Tax paper. The dataset contains ~300k lines of tested LLM-generated code, you should try it out!

1

5

19

Mikita Balesni 🇺🇦

@balesni

4 months

A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:

38

114

451

Geoffrey Irving

@geoffreyirving

5 months

New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.

8

56

343

Rogan Inglis

@RoganInglis

7 months

Cool paper from my (old) team at AISI showing the capability level of current frontier models at autonomous replication!

AI Security Institute

@AISecurityInst

7 months

🚨 New AISI research 🚨 RepliBench is a novel benchmark that measures the ability of frontier AI systems to autonomously replicate. Read the full blog here: https://t.co/ewFJTytscV

0

1

Tomek Korbak

@tomekkorbak

7 months

LLM agents might cause serious harm if they start pursuing misaligned goals. In our new paper, we show how to use capability evals in helping determine which control measures (e.g. monitoring) are sufficient to ensure that an agent can be deployed safely.

5

28

122

Daniel Kokotajlo

@DKokotajlo

7 months

"How, exactly, could AI take over by 2027?" Introducing AI 2027: a deeply-researched scenario forecast I wrote alongside @slatestarcodex, @eli_lifland, and @thlarsen

410

1K

5K

Rohin Shah

@rohinmshah

7 months

Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.) AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.

Google DeepMind

@GoogleDeepMind

7 months

AGI could revolutionize many fields - from healthcare to education - but it's crucial that it’s developed responsibly. Today, we’re sharing how we’re thinking about safety and security on the path to AGI. → https://t.co/tS21BCo8Er

14

71

362

Rogan Inglis

@RoganInglis

8 months

The solutions group at AISI is doing important work and we may not have a lot of time to do it given the rate of AI progress. Working here is great fun! This blog post gives a good overview of our work. If this interests you I highly recommend applying!

AI Security Institute

@AISecurityInst

8 months

AI systems are advancing fast - but safety measures aren’t keeping up. We're actively working to close this gap with technical research to improve AI mitigations and solutions. Our latest blog outlines our approach⬇️

0

1

Xander Davies

@alxndrdavies

8 months

My team is hiring @AISecurityInst! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research & testing. 🧵 1/4

4

37

172

AI Security Institute

@AISecurityInst

8 months

🚨 Introducing the AISI Challenge Fund: £5 million to advance AI security & safety research. Grants of up to £200,000 are available for innovative AI research on technical mitigations, improved evaluations, and stronger safeguards. 🛡️🤖

7

25

98

Buck Shlegeris

@bshlgrs

8 months

📷 Announcing ControlConf: The world’s first conference dedicated to AI control - techniques to mitigate security risks from AI systems even if they’re trying to subvert those controls. March 27-28, 2025 in London. 🧵

7

38

268

Geoffrey Irving

@geoffreyirving

9 months

We're starting two new mitigation teams at AISI, Alignment and Control, which together with Safeguards will form a solutions unit working on direct research, collaboration, and external funding for frontier AI mitigations. Here is a thread on why you should join! 🧵

9

43

216

Jasmine

@j_asminewang

9 months

I’m leading a new team at AISI focused on control empirics. We’re hiring research engineers and research scientists, and you should join us!

8

33

247

AI Security Institute

@AISecurityInst

9 months

Misuse safeguards play an important role in making AI safe - but evaluating how well they work is still an emerging field. Our latest work offers recommendations to accelerate progress and introduces a lightweight template for more effective evaluations. https://t.co/PbP7At0ZtC

0

9

33

Ryan Greenblatt

@RyanPGreenblatt

9 months

Our recent paper found Claude sometimes "fakes alignment"—pretending to comply with training while secretly maintaining its preferences. Could we detect this by offering Claude something (e.g. real money) if it reveals its true preferences? Here's what we found 🧵

44

138

1K