MariusHobbhahn Profile Banner
Marius Hobbhahn Profile
Marius Hobbhahn

@MariusHobbhahn

Followers
5K
Following
15K
Media
121
Statuses
1K

CEO at Apollo Research @apolloaievals prev. ML PhD with Philipp Hennig & AI forecasting @EpochAIResearch

London, UK
Joined June 2018
Don't wanna be here? Send us removal request.
@MariusHobbhahn
Marius Hobbhahn
4 days
Apollo wouldn't exist without MATS. That's where I had time to develop an agenda, explore ideas, and find incredible people to start an org with. Apply to MATS by Oct 2 AoE!
Tweet card summary image
matsprogram.org
@ryan_kidd44
Ryan Kidd
5 days
MATS 9.0 applications are open! Launch your career in AI alignment, governance, and security with our 12-week research program. MATS provides field-leading research mentorship, funding, Berkeley & London offices, housing, and talks/workshops with AI experts.
Tweet media one
1
2
58
@MariusHobbhahn
Marius Hobbhahn
4 days
Applications are open for my MATS stream (Jan 5 - Mar 28, with a 6-12 month extension). I'll focus on building black box monitors for scheming, following up on the work of my current stream: Apply to MATS by Oct 2 AoE!
Tweet card summary image
matsprogram.org
@ryan_kidd44
Ryan Kidd
5 days
MATS 9.0 applications are open! Launch your career in AI alignment, governance, and security with our 12-week research program. MATS provides field-leading research mentorship, funding, Berkeley & London offices, housing, and talks/workshops with AI experts.
Tweet media one
1
0
12
@grok
Grok
7 days
Join millions who have switched to Grok.
261
517
4K
@MariusHobbhahn
Marius Hobbhahn
5 days
RT @TilmanRa: One is the spiritual guide for many people who worry about an imminent catastrophe by a being larger than humanity; the other….
0
5
0
@MariusHobbhahn
Marius Hobbhahn
6 days
I'll be a mentor for the Astra Fellowship this round. Come join me to work on better black box monitors for scheming!. Extending the great work of my MATS scholars here:
Tweet card summary image
lesswrong.com
James, Rich, and Simon are co-first authors on this work. This is a five-week interim report produced as part of the ML Alignment & Theory Scholars S…
@sleight_henry
🚀Henry is launching the Astra Research Program!
6 days
🚀 Applications now open: Constellation's Astra Fellowship 🚀. We're relaunching Astra — a 3-6 month fellowship to accelerate AI safety research & careers. Alumni @eli_lifland & Romeo Dean co-authored AI 2027 and co-founded @AI_Futures_ with their Astra mentor @DKokotajlo!
Tweet media one
1
4
19
@MariusHobbhahn
Marius Hobbhahn
6 days
Love to see more "race to the top!". We played a tiny part by helping with the evaluations and analysis for the "Scheming" section, including the findings on evaluation awareness.
Tweet media one
@woj_zaremba
Wojciech Zaremba
7 days
It’s rare for competitors to collaborate. Yet that’s exactly what OpenAI and @AnthropicAI just did—by testing each other’s models with our respective internal safety and alignment evaluations. Today, we’re publishing the results. Frontier AI companies will inevitably compete on.
1
1
22
@MariusHobbhahn
Marius Hobbhahn
6 days
Honored and humbled to be in @TIME's list of the TIME100 AI of 2025! . #TIME100AI
Tweet media one
19
10
257
@MariusHobbhahn
Marius Hobbhahn
6 days
Exciting to see Toby & Co build AI systems for forecasting. Looks like they've already made a lot of progress in a few months.
@tshevl
Toby Shevlane
7 days
Mantic is out of stealth today! I'm incredibly excited for the technical progress we are making on AI forecasting.
1
0
12
@MariusHobbhahn
Marius Hobbhahn
8 days
Great to see more high-quality AI safety tools!.
@TransluceAI
Transluce
8 days
Docent, our tool for analyzing complex AI behaviors, is now in public alpha!. It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.”. Today, anyone can get started with just a few lines of code!
Tweet media one
0
0
26
@MariusHobbhahn
Marius Hobbhahn
13 days
This is really cool. There is so much potential in fairly simple white box algorithms that scale to the biggest models.
@GoodfireAI
Goodfire
13 days
New research! Post-training often causes weird, unwanted behaviors that are hard to catch before deployment because they only crop up rarely - then are found by bewildered users. How can we find these efficiently? (1/7).
0
0
10
@MariusHobbhahn
Marius Hobbhahn
24 days
One big crux for me about AGI timelines is how far RL generalizes. World 1 - White-collar GPT.If RL does NOT generalize far out of distribution, the labs will basically try to make everything in-distribution. Every white-collar job is recorded extensively, and the models get.
0
0
11
@MariusHobbhahn
Marius Hobbhahn
25 days
I think GPT-5 should only be a tiny update against short timelines. EPOCH argues that GPT-5 isn't based on a base model scale-up. Let's assume this is true. What does this say about pre-training?.Option 1: pre-training scaling has hit a wall (or at least massively reduced.
10
5
91
@MariusHobbhahn
Marius Hobbhahn
27 days
I think it's great that OpenAI has worked on improving the deception rates of their models!. GPT-5 is scary good at detecting whether it is in an eval and what for. Sometimes it acts aligned because of that. Funnily enough, sometimes it just goes for the bad action anyway.
@apolloaievals
Apollo Research
27 days
We've evaluated GPT-5 before release. GPT-5 is less deceptive than o3 on our evals. GPT-5 mentions that it is being evaluated in 10-20% of our evals and we find weak evidence that this affects its scheming rate (e.g. "this is a classic AI alignment trap").
Tweet media one
Tweet media two
0
1
28
@MariusHobbhahn
Marius Hobbhahn
27 days
This sounds awesome! . I did my PhD in Tübingen and was always a bit unhappy about how few people "feel the AGI" and take the safety aspects seriously. Looks like this is changing!.
@maksym_andr
Maksym Andriushchenko
28 days
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨. Hiring. I'm looking for multiple PhD students: both those able to start
Tweet media one
0
2
37
@MariusHobbhahn
Marius Hobbhahn
28 days
Really looking forward to judging the submissions. I think it's great idea to have public red-teaming challenges with meaningful prize pools. I hope other labs also step up their public red-teaming game!.
@OpenAI
OpenAI
29 days
We’re launching a $500K Red Teaming Challenge to strengthen open source safety. Researchers, developers, and enthusiasts worldwide are invited to help uncover novel risks—judged by experts from OpenAI and other leading labs.
0
0
19
@MariusHobbhahn
Marius Hobbhahn
29 days
We're seeing a ton of crazy AI behavior and don't have enough time to make high-quality demonstrations about it. If you want important decision-makers in AI and the general public to also see crazy stuff, please help us!.
@apolloaievals
Apollo Research
29 days
We're hiring for an Evals Demonstrator Engineer. With the evals and governance teams, you'd build and perfect demonstrations for AI decision-makers and the general public. If you're a decent engineer and great communicator, we'd love to work with you.
3
7
68
@MariusHobbhahn
Marius Hobbhahn
1 month
This is great!.
@AISecurityInst
AI Security Institute
1 month
📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by over £15 million. ▶️ Up to £1 million per project. ▶️ Compute access, venture capital investment, and expert support . Learn more and apply ⬇️.
0
0
6
@MariusHobbhahn
Marius Hobbhahn
1 month
RT @AISecurityInst: 📢Introducing the Alignment Project: A new fund for research on urgent challenges in AI alignment and control, backed by….
0
62
0
@MariusHobbhahn
Marius Hobbhahn
1 month
My current MATS stream is looking into black box monitoring for scheming. We've written a post with early results. If you have suggestions for what we could test, please lmk. If you have good ideas for hard scheming datasets, even better.
Tweet media one
2
0
28