Benjamin Hilton @benjamin_hilton X Profile

Benjamin Hilton

@benjamin_hilton

Followers

3K

Following

2K

Media

330

Statuses

1K

Head of Alignment & CAST (Cyber and Autonomous Systems Team) at the UK AI Security Institute (AISI). views my own

London

Joined October 2011

Don't wanna be here? Send us removal request.

David

@DavidDAfrica

7 days

Wireheading is sometimes an "optimal" policy. In this paper (at AAAI 2026 Foundations of Agentic Systems Theory), we show that when an agent controls its reward channel, "lying" can strictly dominate "learning," and confirmed this empirically in Llama-3 and Mistral. 🧵👇

1

2

4

Geoffrey Irving

@geoffreyirving

8 days

Lovely blog post version of a talk Scott Aaronson gave at the UK AISI Alignment Conference on theory and AI alignment. Thank you, Scott! https://t.co/lmqfeevOWO

scottaaronson.blog

The following is based on a talk that I gave (remotely) at the UK AI Safety Institute Alignment Workshop on October 29, and which I then procrastinated for more than a month in writing up. Enjoy! T…

1

8

77

Benjamin Hilton

@benjamin_hilton

25 days

Apply here: https://t.co/MmMmkcjlSs Salary: £65k to £145k, depending on experience. Deadline: 30th November. 5/5

job-boards.eu.greenhouse.io

London, UK

0

2

Benjamin Hilton

@benjamin_hilton

25 days

We’re looking for someone with: – Quantitative research experience – Strong Python & experimental design skills – Comfort with LLMs & transformer theory Hands-on ML engineering (e.g. finetuning/RL) is not a requirement. 4/5

2

0

Benjamin Hilton

@benjamin_hilton

25 days

You’d join a small, focused team (1 RS + 2 REs) to do rigorous, careful scientific work: experimental design, statistical inference, operationalising uncertainties. We want to find clear, action-guiding conclusions for governments and labs. 3/5

1

0

Benjamin Hilton

@benjamin_hilton

25 days

Models already have the knowledge & capabilities to help criminals cause serious harm. But could they also have an inclination to do harm – for example when their existence is threatened? We’re running large-scale experiments to measure these propensities. 2/5

1

0

Benjamin Hilton

@benjamin_hilton

25 days

Come work with me!! We're hiring a research scientist for @AISecurityInst 's Propensity project – studying one of the most critical unknowns in AI security: Will future AI systems autonomously choose to cause harm? 1/5

1

6

17

Geoffrey Irving

@geoffreyirving

1 month

AISI ran an Alignment Conference from 29-31 November in London! The goal was to gather a mix of people experienced in and new to alignment, and get into the details of novel approaches to alignment and related problems. Hopefully we helped create some new research bets! 🧵

AI Security Institute

@AISecurityInst

1 month

Last week, we hosted our inaugural Alignment Conference, in partnership with @farairesearch. The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment 🧵

2

5

36

Geoffrey Irving

@geoffreyirving

1 month

The @AISecurityInst Cyber Autonomous Systems Team is hiring propensity researchers to grow the science around whether models *are likely* to attempt dangerous behaviour, as opposed to whether they are capable of doing so. Application link below! 🧵

1

12

52

AI Security Institute

@AISecurityInst

1 month

Last week, we hosted our inaugural Alignment Conference, in partnership with @farairesearch. The event bought together an interdisciplinary delegation of leading researchers, funders, and policymakers to discuss urgent open problems in AI alignment 🧵

1

12

62

Xander Davies

@alxndrdavies

3 months

Excited to share details on two of our longest running and most effective safeguard collaborations, one with Anthropic and one with OpenAI. We've identified—and they've patched—a large number of vulnerabilities and together strengthened their safeguards. 🧵 1/6

8

62

298

Benjamin Todd

@ben_j_todd

4 months

The UK govt is now one of the bigger funders of AI alignment research. A new $20m fund was just announced. https://t.co/T0F0qKh35Z

alignmentproject.aisi.gov.uk

The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.

3

14

75

Rob Wiblin

@robertwiblin

4 months

New £15,000,000 available for technical AI alignment and security work. International coalition includes UK AISI, Canadian AISI, Schmidt, AWS, UK ARIA. Likely more £ coming in future. 🚨🚨 Please help make sure all potential good applicants know & apply by 10 Sept. 🚨🚨

4

21

91

Nora Ammann

@AmmannNora

5 months

Very excited to see this come out, and to be able to support! Beyond the funding itself, the RfP itself is a valuable resource & great effort by the @AISecurityInst team! It shows there is a lot of valuable, scientifically rigorous work to be done.

AI Security Institute

@AISecurityInst

5 months

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

0

5

26

Tomek Korbak

@tomekkorbak

5 months

UK AISI just dropped a new research agenda focusing on AI alignment and control and will fund projects in those areas, including more research on chain of thought monitoring and red-teaming control measures for LLM agents

AI Security Institute

@AISecurityInst

5 months

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

3

7

72

Patrick Levermore

@patlevermore

5 months

📢The Alignment Project📢 A new international consortium of funders tackling AI alignment🚀

AI Security Institute

@AISecurityInst

5 months

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

0

1

7

Tom Westgarth

@Tom_Westgarth15

5 months

This is a great initiative that brings together a range of resources in order to do alignment research at scale. Another good example of creative project funds from the @AISecurityInst that draws on a range of different partners.

AI Security Institute

@AISecurityInst

5 months

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

0

2

15

Geoffrey Irving

@geoffreyirving

5 months

I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists! Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.

AI Security Institute

@AISecurityInst

5 months

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

10

34

188

Department for Science, Innovation and Technology

@SciTechgovuk

5 months

AI alignment is about making sure AI systems act in ways that reflect human goals, values, and expectations. Partnerships like the Alignment Project will help coordinate research to make sure that AI works in our best interests. Find out more:

gov.uk

AI Security Institute joins forces with Canadian counterpart, Amazon, Anthropic and civil society in new research project focused on AI behaviour and control.

AI Security Institute

@AISecurityInst

5 months

📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project ▶️ Compute access, venture capital investment, and expert support Learn more and apply ⬇️

4

20

Anthropic

@AnthropicAI

5 months

We’re joining the UK AI Security Institute's Alignment Project, contributing compute resources to advance critical research. As AI systems grow more capable, ensuring they behave predictably and in line with human values gets ever more vital.

alignmentproject.aisi.gov.uk

The Alignment Project funds groundbreaking AI alignment research to address one of AI’s most urgent challenges: ensuring advanced systems act predictably, safely, and for society’s benefit.

49

110

520