benjamin_hilton Profile Banner
Benjamin Hilton Profile
Benjamin Hilton

@benjamin_hilton

Followers
3K
Following
2K
Media
330
Statuses
1K

Head of Alignment & CAST (Cyber and Autonomous Systems Team) at the UK AI Security Institute (AISI). views my own

London
Joined October 2011
Don't wanna be here? Send us removal request.
@DavidDAfrica
David
7 days
Wireheading is sometimes an "optimal" policy. In this paper (at AAAI 2026 Foundations of Agentic Systems Theory), we show that when an agent controls its reward channel, "lying" can strictly dominate "learning," and confirmed this empirically in Llama-3 and Mistral. 🧵👇
1
2
4
@benjamin_hilton
Benjamin Hilton
25 days
Apply here: https://t.co/MmMmkcjlSs Salary: ÂŁ65k to ÂŁ145k, depending on experience. Deadline: 30th November. 5/5
Tweet card summary image
job-boards.eu.greenhouse.io
London, UK
0
0
2
@benjamin_hilton
Benjamin Hilton
25 days
We’re looking for someone with: – Quantitative research experience – Strong Python & experimental design skills – Comfort with LLMs & transformer theory Hands-on ML engineering (e.g. finetuning/RL) is not a requirement. 4/5
2
0
0
@benjamin_hilton
Benjamin Hilton
25 days
You’d join a small, focused team (1 RS + 2 REs) to do rigorous, careful scientific work: experimental design, statistical inference, operationalising uncertainties. We want to find clear, action-guiding conclusions for governments and labs. 3/5
1
0
0
@benjamin_hilton
Benjamin Hilton
25 days
Models already have the knowledge & capabilities to help criminals cause serious harm. But could they also have an inclination to do harm – for example when their existence is threatened? We’re running large-scale experiments to measure these propensities. 2/5
1
0
0
@benjamin_hilton
Benjamin Hilton
25 days
Come work with me!! We're hiring a research scientist for @AISecurityInst 's Propensity project – studying one of the most critical unknowns in AI security: Will future AI systems autonomously choose to cause harm? 1/5
1
6
17
@geoffreyirving
Geoffrey Irving
1 month
AISI ran an Alignment Conference from 29-31 November in London! The goal was to gather a mix of people experienced in and new to alignment, and get into the details of novel approaches to alignment and related problems. Hopefully we helped create some new research bets! đź§µ
@AISecurityInst
AI Security Institute
1 month
Last week, we hosted our inaugural Alignment Conference, in partnership with @farairesearch. The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment đź§µ
2
5
36
@geoffreyirving
Geoffrey Irving
1 month
The @AISecurityInst Cyber Autonomous Systems Team is hiring propensity researchers to grow the science around whether models *are likely* to attempt dangerous behaviour, as opposed to whether they are capable of doing so. Application link below! đź§µ
1
12
52
@AISecurityInst
AI Security Institute
1 month
Last week, we hosted our inaugural Alignment Conference, in partnership with @farairesearch. The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment đź§µ
1
12
62
@alxndrdavies
Xander Davies
3 months
Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6
8
62
298
@robertwiblin
Rob Wiblin
4 months
New £15,000,000 available for technical AI alignment and security work. International coalition includes UK AISI, Canadian AISI, Schmidt, AWS, UK ARIA. Likely more £ coming in future. 🚨🚨 Please help make sure all potential good applicants know & apply by 10 Sept. 🚨🚨
4
21
91
@AmmannNora
Nora Ammann
5 months
Very excited to see this come out, and to be able to support! Beyond the funding itself, the RfP itself is a valuable resource & great effort by the @AISecurityInst team! It shows there is a lot of valuable, scientifically rigorous work to be done.
@AISecurityInst
AI Security Institute
5 months
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
0
5
26
@tomekkorbak
Tomek Korbak
5 months
UK AISI just dropped a new research agenda focusing on AI alignment and control and will fund projects in those areas, including more research on chain of thought monitoring and red-teaming control measures for LLM agents
@AISecurityInst
AI Security Institute
5 months
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
3
7
72
@patlevermore
Patrick Levermore
5 months
📢The Alignment Project📢 A new international consortium of funders tackling AI alignment🚀
@AISecurityInst
AI Security Institute
5 months
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
0
1
7
@Tom_Westgarth15
Tom Westgarth
5 months
This is a great initiative that brings together a range of resources in order to do alignment research at scale. Another good example of creative project funds from the @AISecurityInst that draws on a range of different partners.
@AISecurityInst
AI Security Institute
5 months
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
0
2
15
@geoffreyirving
Geoffrey Irving
5 months
I am very excited that AISI is announcing over ÂŁ15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists! Here is a đź§µ about why it is important to bring more independent ideas and expertise into this space.
@AISecurityInst
AI Security Institute
5 months
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
10
34
188
@SciTechgovuk
Department for Science, Innovation and Technology
5 months
AI alignment is about making sure AI systems act in ways that reflect human goals, values, and expectations. Partnerships like the Alignment Project will help coordinate research to make sure that AI works in our best interests. Find out more:
Tweet card summary image
gov.uk
AI Security Institute joins forces with Canadian counterpart, Amazon, Anthropic and civil society in new research project focused on AI behaviour and control.
@AISecurityInst
AI Security Institute
5 months
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
4
4
20
@AnthropicAI
Anthropic
5 months
We’re joining the UK AI Security Institute's Alignment Project, contributing compute resources to advance critical research. As AI systems grow more capable, ensuring they behave predictably and in line with human values gets ever more vital.
Tweet card summary image
alignmentproject.aisi.gov.uk
The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.
49
110
520