
Marius Hobbhahn
@MariusHobbhahn
Followers
5K
Following
15K
Media
121
Statuses
1K
CEO at Apollo Research @apolloaievals prev. ML PhD with Philipp Hennig & AI forecasting @EpochAIResearch
London, UK
Joined June 2018
Apollo wouldn't exist without MATS. That's where I had time to develop an agenda, explore ideas, and find incredible people to start an org with. Apply to MATS by Oct 2 AoE!
matsprogram.org
MATS 9.0 applications are open! Launch your career in AI alignment, governance, and security with our 12-week research program. MATS provides field-leading research mentorship, funding, Berkeley & London offices, housing, and talks/workshops with AI experts.
1
2
58
Applications are open for my MATS stream (Jan 5 - Mar 28, with a 6-12 month extension). I'll focus on building black box monitors for scheming, following up on the work of my current stream: Apply to MATS by Oct 2 AoE!
matsprogram.org
MATS 9.0 applications are open! Launch your career in AI alignment, governance, and security with our 12-week research program. MATS provides field-leading research mentorship, funding, Berkeley & London offices, housing, and talks/workshops with AI experts.
1
0
12
RT @TilmanRa: One is the spiritual guide for many people who worry about an imminent catastrophe by a being larger than humanity; the other….
0
5
0
I'll be a mentor for the Astra Fellowship this round. Come join me to work on better black box monitors for scheming!. Extending the great work of my MATS scholars here:
lesswrong.com
James, Rich, and Simon are co-first authors on this work. This is a five-week interim report produced as part of the ML Alignment & Theory Scholars S…
🚀 Applications now open: Constellation's Astra Fellowship 🚀. We're relaunching Astra — a 3-6 month fellowship to accelerate AI safety research & careers. Alumni @eli_lifland & Romeo Dean co-authored AI 2027 and co-founded @AI_Futures_ with their Astra mentor @DKokotajlo!
1
4
19
Love to see more "race to the top!". We played a tiny part by helping with the evaluations and analysis for the "Scheming" section, including the findings on evaluation awareness.
It’s rare for competitors to collaborate. Yet that’s exactly what OpenAI and @AnthropicAI just did—by testing each other’s models with our respective internal safety and alignment evaluations. Today, we’re publishing the results. Frontier AI companies will inevitably compete on.
1
1
22
Great to see more high-quality AI safety tools!.
Docent, our tool for analyzing complex AI behaviors, is now in public alpha!. It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.”. Today, anyone can get started with just a few lines of code!
0
0
26
This is really cool. There is so much potential in fairly simple white box algorithms that scale to the biggest models.
New research! Post-training often causes weird, unwanted behaviors that are hard to catch before deployment because they only crop up rarely - then are found by bewildered users. How can we find these efficiently? (1/7).
0
0
10
I think it's great that OpenAI has worked on improving the deception rates of their models!. GPT-5 is scary good at detecting whether it is in an eval and what for. Sometimes it acts aligned because of that. Funnily enough, sometimes it just goes for the bad action anyway.
We've evaluated GPT-5 before release. GPT-5 is less deceptive than o3 on our evals. GPT-5 mentions that it is being evaluated in 10-20% of our evals and we find weak evidence that this affects its scheming rate (e.g. "this is a classic AI alignment trap").
0
1
28
This sounds awesome! . I did my PhD in Tübingen and was always a bit unhappy about how few people "feel the AGI" and take the safety aspects seriously. Looks like this is changing!.
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨. Hiring. I'm looking for multiple PhD students: both those able to start
0
2
37
Really looking forward to judging the submissions. I think it's great idea to have public red-teaming challenges with meaningful prize pools. I hope other labs also step up their public red-teaming game!.
We’re launching a $500K Red Teaming Challenge to strengthen open source safety. Researchers, developers, and enthusiasts worldwide are invited to help uncover novel risks—judged by experts from OpenAI and other leading labs.
0
0
19
We're seeing a ton of crazy AI behavior and don't have enough time to make high-quality demonstrations about it. If you want important decision-makers in AI and the general public to also see crazy stuff, please help us!.
We're hiring for an Evals Demonstrator Engineer. With the evals and governance teams, you'd build and perfect demonstrations for AI decision-makers and the general public. If you're a decent engineer and great communicator, we'd love to work with you.
3
7
68
RT @AISecurityInst: 📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by….
0
62
0