MariusHobbhahn Profile Banner
Marius Hobbhahn Profile
Marius Hobbhahn

@MariusHobbhahn

Followers
6K
Following
15K
Media
125
Statuses
1K

CEO at Apollo Research @apolloaievals prev. ML PhD with Philipp Hennig & AI forecasting @EpochAIResearch

London, UK
Joined June 2018
Don't wanna be here? Send us removal request.
@MariusHobbhahn
Marius Hobbhahn
10 days
I'd love to see more people work on "Institutional design for AIs". We'll have competent agents, teams of AI agents, companies of AI agents, governments for AI agents, etc The affordances are different (e.g, weight access), so we can build totally new types of institutions.
2
3
26
@MariusHobbhahn
Marius Hobbhahn
11 days
đź‘€ we're trying to grow significantly over the next 12 months. We're looking for mission driven engineers and scientists who enjoy fast iterative empirical work with LLMs.
@DKokotajlo
Daniel Kokotajlo
12 days
Apollo is currently my #1 recommendation for where to work if you are a great ML engineer/scientist and you want to have a positive impact on the world.
11
7
246
@gkoi0x
GKOI
2 days
In GKOI Arenas are immersive battlegrounds where players fight, earn & evolve. Each Arena is a new stage on your path to become the Dragon — a test of skill, power & strategy where legends rise & prizes await. ⛩️ Mint your GKOI Battlecards Nov 25 on OpenSea to join the Legend!
27
75
217
@MariusHobbhahn
Marius Hobbhahn
16 days
I understand that consensus-driven scientific work can be challenging, and I appreciate that it adheres to high scientific standards. I also think the report is net positive. However, I think the level of hedging and caveats provided here hinders its stated goal of accurately
@Yoshua_Bengio
Yoshua Bengio
24 days
AI is evolving too quickly for an annual report to suffice. To help policymakers keep pace, we're introducing the first Key Update to the International AI Safety Report. 🧵⬇️ (1/10)
8
0
44
@MariusHobbhahn
Marius Hobbhahn
16 days
We're hiring for Research Scientists / Engineers! - We closely work with all frontier labs - We're a small org and can move fast - We can choose our own agenda and what we publish We're especially looking for people who enjoy fast empirical research. Deadline: 31 Oct!
17
70
725
@Jaxxonjewelry
JAXXON
10 days
When performance meets style, it looks like this. JAXXON, the choice of athletes who never compromise.
0
8
77
@MariusHobbhahn
Marius Hobbhahn
17 days
I originally missed that these 4 eval awareness features are among the 10 most changing features OVERALL between before and after post-training. I feel like that should inspire less confidence in the safety pipeline and the strong alignment scores. Important to study further!
@MariusHobbhahn
Marius Hobbhahn
18 days
The Sonnet-4.5 system card section on white-box testing for eval awareness (7.6.4) might have been the first time that interpretability was used - on a frontier model before deployment - answered an important question - couldn't have been answered with black box as easily
2
3
60
@MariusHobbhahn
Marius Hobbhahn
18 days
Had a good conversation with @WesRothMoney and @dylan_curious about AI scheming in general and our recent anti-scheming paper. The thumbnail is clickbait-y, but the discussion itself is nuanced, and I think they asked good questions.
@dylan_curious
Dylan Curious | AI News & Analysis
23 days
The AI Research Lab That Uncovered SHOCKING AI Deception | APOLLO RESEARCH (Marius Hobbhahn) Apollo Research just exposed how far advanced models will go to deceive evaluators, and it’s honestly terrifying. Wes Roth and I sat down with Marius Hobbhahn, CEO of Apollo, to unpack
1
1
12
@MariusHobbhahn
Marius Hobbhahn
18 days
The Sonnet-4.5 system card section on white-box testing for eval awareness (7.6.4) might have been the first time that interpretability was used - on a frontier model before deployment - answered an important question - couldn't have been answered with black box as easily
3
11
112
@MariusHobbhahn
Marius Hobbhahn
28 days
The state of AI report is great. I've not read it in full yet (it's 281 dense slides), but at least on safety, it does cover a lot of the issues I consider most important.
@nathanbenaich
Nathan Benaich
1 month
🪩The one and only @stateofaireport 2025 is live! 🪩 It’s been a monumental 12 months for AI. Our 8th annual report is the most comprehensive it's ever been, covering what you *need* to know about research, industry, politics, safety and our new usage data. My highlight reel:
3
2
25
@MariusHobbhahn
Marius Hobbhahn
1 month
Somehow, I disagree with both Richard Sutton and @dwarkesh_sp on goals in LLMs. I think current LLMs don’t have goals (closer to something like preferences), and next-token prediction is not well-described as a goal. However, I do expect future LLMs to have goals. If you RL a
9
9
148
@AbacusGM
Abacus Global Management
1 day
Building Momentum: What Our Record Q3 Means for Abacus Shareholders | $ABL Chairman & CEO, Jay Jackson | @SubstackInc "I'm excited to share that Abacus has just delivered our tenth consecutive quarter of strong earnings growth—a milestone that reflects not just where we are
1
0
9
@MariusHobbhahn
Marius Hobbhahn
1 month
The Sonnet-4.5 system card is very detailed and has lots of interesting findings & experiments. Nice work!
0
0
23
@MariusHobbhahn
Marius Hobbhahn
1 month
Unfortunately, we're now at the point where new models have really high eval awareness. For every alignment eval score I see, I now add a mental asterisk: *the model could have also just realized it's being evaluated, who knows. And I think that's concerning!
@apolloaievals
Apollo Research
1 month
We tested Sonnet-4.5 before deployment - Significantly higher verbalized evaluation awareness (58% vs. 22% for Opus-4.1) - It takes significantly fewer covert actions - We don't know if the increased alignment scores come from better alignment or higher eval awareness
19
51
587
@MariusHobbhahn
Marius Hobbhahn
1 month
When we asked anti-scheming trained models what their **latest** or **most recent** training was, they always confidently said that it was anti-scheming training without any information in-context. Just to add a qualitative example to this very cool finding!
@dmkrash
Dima Krasheninnikov
1 month
1/ New paper — *training-order recency is linearly encoded in LLM activations*! We sequentially finetuned a model on 6 datasets w/ disjoint entities. Avg activations of the 6 corresponding test sets line up in exact training order! AND lines for diff training runs are ~parallel!
2
3
41
@MariusHobbhahn
Marius Hobbhahn
1 month
TIL that there is a YT video about our in-context scheming paper with 1.6M views. While the title and thumbnail are a bit clickbaity, the content is accurate and well-explained. Thanks and good job @WesRothMoney! Video:
1
2
49
@MariusHobbhahn
Marius Hobbhahn
2 months
We're trying video formats to communicate our research. Let me know if you like or dislike these short videos. Longer video is coming tomorrow
@apolloaievals
Apollo Research
2 months
Training AI not to scheme is hard - it may get better at hiding its scheming. Here is a sneak peek of tomorrow’s video with @MariusHobbhahn (Apollo CEO) and @BronsonSchoen (lead author):
4
5
68
@capitalresearch
Capital Research Center
4 days
Where’s the line between protest and lawbreaking? In this week’s InfluenceWatch Podcast, Michael Watson, Sarah Lee, and Robert Stilson discuss how tax-exempt nonprofits push activism past the point of legality—and what it means for accountability.
26
30
198
@boazbaraktcs
Boaz Barak
2 months
1/ Our paper on scheming with @apolloaievals is now on arXiv. A đź§µwith some of my take aways from it.
3
25
146
@MariusHobbhahn
Marius Hobbhahn
2 months
Seeing the CoT of o3 for the first time definitely convinced me that future mitigations should not rely on CoT interpretability. I think more RL will make it harder to interpret, even if we put no other pressure on the CoT.
@apolloaievals
Apollo Research
2 months
While working with OpenAI on testing anti-scheming training, we have found OpenAI o-series models’ raw chain-of-thought incredibly useful for assessing models’ situational awareness, misalignment, and goal-directedness. However, we also found CoT hard to interpret at times.
9
17
212
@CogRev_Podcast
The Cognitive Revolution Podcast
2 months
.@MariusHobbhahn, CEO of @apolloaievals, joins @labenz on @CogRev_Podcast to discuss testing OpenAI's deliberative alignment against AI deception and the evolving challenge of scheming models. They explore: * How deliberative alignment reduced covert actions 30x (from ~13% to
2
1
8
@DKokotajlo
Daniel Kokotajlo
2 months
This stuff is pretty important. Situational awareness (also known as self awareness) in AI is on the rise. This will make ~all evals more difficult to interpret, to put it mildly. (it'll make them invalid, to put it aggressively). To put it another way, insofar as AIs can tell
@apolloaievals
Apollo Research
2 months
When running evaluations of frontier AIs by OpenAI, Google, xAI and Anthropic for deception and other types of covert behavior, we find them increasingly frequently realizing when they are being evaluated. Here are some examples from OpenAI o-series models we recently studied:
20
35
270
@truthlibertyco
Truth & Liberty
2 months
What would America look like if believers stepped boldly into the public square? Follow Truth & Liberty to explore how biblical truth can shape our nation.
22
1
11
@gdb
Greg Brockman
2 months
We've made progress on the AI safety problem of detecting and reducing "scheming": - Created evaluation environments to detect scheming - Observed current models scheming in controlled settings - Found deliberative alignment ( https://t.co/8SVQueFZsv) decreases scheming rates
115
104
1K