Benjamin Hilton
@benjamin_hilton
Followers
3K
Following
2K
Media
330
Statuses
1K
Head of Alignment & CAST (Cyber and Autonomous Systems Team) at the UK AI Security Institute (AISI). views my own
London
Joined October 2011
Wireheading is sometimes an "optimal" policy. In this paper (at AAAI 2026 Foundations of Agentic Systems Theory), we show that when an agent controls its reward channel, "lying" can strictly dominate "learning," and confirmed this empirically in Llama-3 and Mistral. 🧵👇
1
2
4
Lovely blog post version of a talk Scott Aaronson gave at the UK AISI Alignment Conference on theory and AI alignment. Thank you, Scott! https://t.co/lmqfeevOWO
scottaaronson.blog
The following is based on a talk that I gave (remotely) at the UK AI Safety Institute Alignment Workshop on October 29, and which I then procrastinated for more than a month in writing up. Enjoy! T…
1
8
77
Apply here: https://t.co/MmMmkcjlSs Salary: ÂŁ65k to ÂŁ145k, depending on experience. Deadline: 30th November. 5/5
job-boards.eu.greenhouse.io
London, UK
0
0
2
We’re looking for someone with: – Quantitative research experience – Strong Python & experimental design skills – Comfort with LLMs & transformer theory Hands-on ML engineering (e.g. finetuning/RL) is not a requirement. 4/5
2
0
0
You’d join a small, focused team (1 RS + 2 REs) to do rigorous, careful scientific work: experimental design, statistical inference, operationalising uncertainties. We want to find clear, action-guiding conclusions for governments and labs. 3/5
1
0
0
Models already have the knowledge & capabilities to help criminals cause serious harm. But could they also have an inclination to do harm – for example when their existence is threatened? We’re running large-scale experiments to measure these propensities. 2/5
1
0
0
Come work with me!! We're hiring a research scientist for @AISecurityInst 's Propensity project – studying one of the most critical unknowns in AI security: Will future AI systems autonomously choose to cause harm? 1/5
1
6
17
AISI ran an Alignment Conference from 29-31 November in London! The goal was to gather a mix of people experienced in and new to alignment, and get into the details of novel approaches to alignment and related problems. Hopefully we helped create some new research bets! đź§µ
Last week, we hosted our inaugural Alignment Conference, in partnership with @farairesearch. The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment đź§µ
2
5
36
The @AISecurityInst Cyber Autonomous Systems Team is hiring propensity researchers to grow the science around whether models *are likely* to attempt dangerous behaviour, as opposed to whether they are capable of doing so. Application link below! đź§µ
1
12
52
Last week, we hosted our inaugural Alignment Conference, in partnership with @farairesearch. The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment đź§µ
1
12
62
Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6
8
62
298
The UK govt is now one of the bigger funders of AI alignment research. A new $20m fund was just announced. https://t.co/T0F0qKh35Z
alignmentproject.aisi.gov.uk
The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.
3
14
75
New £15,000,000 available for technical AI alignment and security work. International coalition includes UK AISI, Canadian AISI, Schmidt, AWS, UK ARIA. Likely more £ coming in future. 🚨🚨 Please help make sure all potential good applicants know & apply by 10 Sept. 🚨🚨
4
21
91
Very excited to see this come out, and to be able to support! Beyond the funding itself, the RfP itself is a valuable resource & great effort by the @AISecurityInst team! It shows there is a lot of valuable, scientifically rigorous work to be done.
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
0
5
26
UK AISI just dropped a new research agenda focusing on AI alignment and control and will fund projects in those areas, including more research on chain of thought monitoring and red-teaming control measures for LLM agents
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
3
7
72
📢The Alignment Project📢 A new international consortium of funders tackling AI alignment🚀
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
0
1
7
This is a great initiative that brings together a range of resources in order to do alignment research at scale. Another good example of creative project funds from the @AISecurityInst that draws on a range of different partners.
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
0
2
15
I am very excited that AISI is announcing over ÂŁ15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists! Here is a đź§µ about why it is important to bring more independent ideas and expertise into this space.
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
10
34
188
AI alignment is about making sure AI systems act in ways that reflect human goals, values, and expectations. Partnerships like the Alignment Project will help coordinate research to make sure that AI works in our best interests. Find out more:
gov.uk
AI Security Institute joins forces with Canadian counterpart, Amazon, Anthropic and civil society in new research project focused on AI behaviour and control.
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️
4
4
20
We’re joining the UK AI Security Institute's Alignment Project, contributing compute resources to advance critical research. As AI systems grow more capable, ensuring they behave predictably and in line with human values gets ever more vital.
alignmentproject.aisi.gov.uk
The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.
49
110
520