
Matthew Farrugia-Roberts
@MatthewFdashR
Followers
22
Following
14
Media
6
Statuses
20
Grad student trying to understand the history of humanity, the future of AI, and how to make both of these things work together in the present.
Oxford, UK
Joined June 2025
@jesse_hoogland Goal misgeneralisation remains an important risk model for future advanced AI systems. We should continue to research how neural networks choose between different solutions and leverage that understanding into methods of avoiding unintended and dangerous solutions in the future.
0
0
8
@jesse_hoogland For more complex environments, we still need better UED methods. But UED is young!. There are plenty of plausible directions for improving over the methods that have been proposed so far. The question is, is there enough room for improvement for this to help when it counts?.
1
0
7
@jesse_hoogland … At least, that is the vision!. In our paper, we take the first steps. We formally show that UED's minimax regret objective fixes goal misgeneralisation, and we show that UED baselines are more robust to goal misgeneralisation than standard methods in simple environments.
1
0
6
One reason to be optimistic about UED for alignment is that it leans into the "sweet lesson", a la @jesse_hoogland (:. It translates *more capabilities* and *more compute* (on the part of the adversary) into *more robust alignment.*.
1
0
6
At least for me, the big-picture motivation behind our RLC paper is a research vision for scalable AI alignment via minimax regret autocurricula. Learn about the paper via co-author @Karim_abdelll: 🧵👉 Learn about why I think this is important work 🧵👇.
*New AI Alignment Paper*. 🚨 Goal misgeneralization occurs when AI agents learn the wrong reward function, instead of the human's intended goal. 😇 We show that training with a minimax regret objective provably mitigates it, promoting safer and better-aligned RL policies!
2
9
27
Our paper, "Mitigating goal misgeneralization via minimax regret," will appear at RLC 2025!. Congratulations to my co-authors @Karim_abdelll , @usmananwar391, @hannaherlebach, @casdewitt, @DavidSKrueger, and @MichaelD1729 🎉. Preprint out now Thread soon!.
0
0
5
Accordingly, last year, I was invited to give a guest lecture on ethical questions raised by potential future advancements in AI for the final week of @UniMelb's COMP90087 The Ethics of Artificial Intelligence.
0
0
0