Rogan Inglis Profile
Rogan Inglis

@RoganInglis

Followers
77
Following
271
Media
1
Statuses
49

Senior Research Engineer, Control at AI Security Institute

London, UK
Joined May 2012
Don't wanna be here? Send us removal request.
@FabienDRoger
Fabien Roger
20 days
I am excited about ControlArena! It includes environments where red-team agents pursue side-objectives that are close to what misaligned AIs could try to cause catastrophes. Tools like this are essential to evaluate how strong black-box mitigations against misalignment are.
@geoffreyirving
Geoffrey Irving
20 days
New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! đź§µ
1
2
13
@tylertracy321
Tyler Tracy
20 days
I use controlarena all the time. I think it is a great way to get into control research!
@AISecurityInst
AI Security Institute
20 days
🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵
0
1
8
@geoffreyirving
Geoffrey Irving
20 days
New open source library from @AISecurityInst! ControlArena lowers the barrier to secure and reproducible AI control research, to boost work on blocking and detecting malicious actions in case AI models are misaligned. In use by researchers at GDM, Anthropic, Redwood, and MATS! đź§µ
3
15
77
@AsaCoopStick
Asa Cooper Stickland
20 days
AI control experiments were difficult to reproduce, so we made a library where swapping out environments or control protocols is one LoC, plus put a bunch of effort into designing realistic SWE environments with appropriate benign tasks + associated malicious side tasks. Now
@AISecurityInst
AI Security Institute
20 days
🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵
0
2
17
@RoganInglis
Rogan Inglis
20 days
At UK AISI we've been building ControlArena, a Python library AI Control research. It is built on top of Inspect and designed to make it easy to run Control experiments. If you are researching AI Control check it out! https://t.co/Ct865YhKj7
@AISecurityInst
AI Security Institute
20 days
🔒How can we prevent harm from AI systems that pursue unintended goals? AI control is a promising research agenda seeking to address this critical question. Today, we’re excited to launch ControlArena – our library for running secure and reproducible AI control experiments🧵
0
1
12
@MiTerekhov
Mikhail Terekhov
2 months
The main public benchmark for AI Control is the Control Arena by @AISecurityInst. It now supports our code backdoor dataset from the Control Tax paper. The dataset contains ~300k lines of tested LLM-generated code, you should try it out!
1
5
19
@balesni
Mikita Balesni 🇺🇦
4 months
A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it:
38
114
451
@geoffreyirving
Geoffrey Irving
5 months
New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see đź§µ), even when the AIs involved have similar available compute.
8
56
343
@RoganInglis
Rogan Inglis
7 months
Cool paper from my (old) team at AISI showing the capability level of current frontier models at autonomous replication!
@AISecurityInst
AI Security Institute
7 months
🚨 New AISI research 🚨 RepliBench is a novel benchmark that measures the ability of frontier AI systems to autonomously replicate. Read the full blog here: https://t.co/ewFJTytscV
0
0
1
@tomekkorbak
Tomek Korbak
7 months
LLM agents might cause serious harm if they start pursuing misaligned goals. In our new paper, we show how to use capability evals in helping determine which control measures (e.g. monitoring) are sufficient to ensure that an agent can be deployed safely.
5
28
122
@DKokotajlo
Daniel Kokotajlo
7 months
"How, exactly, could AI take over by 2027?" Introducing AI 2027: a deeply-researched scenario forecast I wrote alongside @slatestarcodex, @eli_lifland, and @thlarsen
410
1K
5K
@rohinmshah
Rohin Shah
7 months
Just released GDM’s 100+ page approach to AGI safety & security! (Don’t worry, there’s a 10 page summary.) AGI will be transformative. It enables massive benefits, but could also pose risks. Responsible development means proactively preparing for severe harms before they arise.
@GoogleDeepMind
Google DeepMind
7 months
AGI could revolutionize many fields - from healthcare to education - but it's crucial that it’s developed responsibly. Today, we’re sharing how we’re thinking about safety and security on the path to AGI. → https://t.co/tS21BCo8Er
14
71
362
@RoganInglis
Rogan Inglis
8 months
The solutions group at AISI is doing important work and we may not have a lot of time to do it given the rate of AI progress. Working here is great fun! This blog post gives a good overview of our work. If this interests you I highly recommend applying!
@AISecurityInst
AI Security Institute
8 months
AI systems are advancing fast - but safety measures aren’t keeping up. We're actively working to close this gap with technical research to improve AI mitigations and solutions. Our latest blog outlines our approach⬇️
0
0
1
@alxndrdavies
Xander Davies
8 months
My team is hiring @AISecurityInst! I think this is one of the most important times in history to have strong technical expertise in government. Join our team understanding and fixing weaknesses in frontier models through sota adversarial ML research & testing. đź§µ 1/4
4
37
172
@AISecurityInst
AI Security Institute
8 months
🚨 Introducing the AISI Challenge Fund: £5 million to advance AI security & safety research. Grants of up to £200,000 are available for innovative AI research on technical mitigations, improved evaluations, and stronger safeguards. 🛡️🤖
7
25
98
@bshlgrs
Buck Shlegeris
8 months
📷 Announcing ControlConf: The world’s first conference dedicated to AI control - techniques to mitigate security risks from AI systems even if they’re trying to subvert those controls. March 27-28, 2025 in London. 🧵
7
38
268
@geoffreyirving
Geoffrey Irving
9 months
We're starting two new mitigation teams at AISI, Alignment and Control, which together with Safeguards will form a solutions unit working on direct research, collaboration, and external funding for frontier AI mitigations. Here is a thread on why you should join! đź§µ
9
43
216
@j_asminewang
Jasmine
9 months
I’m leading a new team at AISI focused on control empirics. We’re hiring research engineers and research scientists, and you should join us!
8
33
247
@AISecurityInst
AI Security Institute
9 months
Misuse safeguards play an important role in making AI safe - but evaluating how well they work is still an emerging field. Our latest work offers recommendations to accelerate progress and introduces a lightweight template for more effective evaluations. https://t.co/PbP7At0ZtC
0
9
33
@RyanPGreenblatt
Ryan Greenblatt
9 months
Our recent paper found Claude sometimes "fakes alignment"—pretending to comply with training while secretly maintaining its preferences. Could we detect this by offering Claude something (e.g. real money) if it reveals its true preferences? Here's what we found 🧵
44
138
1K