Evan Hubinger @EvanHub X Profile

Evan Hubinger

@EvanHub

Followers

8K

Following

14K

Media

15

Statuses

557

Head of Alignment Stress-Testing @AnthropicAI. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)

https://t.co/aHDJJIQq3j

California

Joined May 2010

Don't wanna be here? Send us removal request.

AI Impacts

@AIImpacts

7 days

Our surveys’ findings that AI researchers assign a median 5-10% to extinction or similar made a splash (NYT, NBC News, TIME..) But people sometimes underestimate our survey’s methodological quality due to various circulating misconceptions. Today, an FAQ correcting key errors:

2

12

70

Joe Carlsmith

@jkcarlsmith

4 days

Last Friday was my last day at @open_phil. I’ll be joining @AnthropicAI in mid-November, helping with the design of Claude’s character/constitution/spec. I wrote a blog post about this move, link in thread.

40

8

490

Anthropic

@AnthropicAI

3 days

Even when new AI models bring clear improvements in capabilities, deprecating the older generations comes with downsides. An update on how we’re thinking about these costs, and some of the early steps we’re taking to mitigate them:

anthropic.com

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

154

173

1K

Daniel Ziegler

@d_m_ziegler

10 days

Part of my job on Anthropic’s Alignment Stress-Testing Team is to write internal reviews of our RSP activities, acting as a “second line of defense” for safety. Today, we’re publishing one of our reviews for the first time alongside the pilot Sabotage Risk Report.

Sam Bowman

@sleepinyourhat

10 days

🌱⚠️ weeds-ey but important milestone ⚠️🌱 This is a first concrete example of the kind of analysis, reporting, and accountability that we’re aiming for as part of our Responsible Scaling Policy commitments on misalignment.

2

5

34

HUVE

@HUVE_official

3 days

Still, Sparkling or HUVE? The HUVE Perform hydrogen water bottle uses advanced electrolysis to infuse your water with molecular hydrogen - a powerful antioxidant, shown to help neutralize free radicals, reduce inflammation, and support recovery at the cellular level.

3

4

15

Senator Scott Wiener

@Scott_Wiener

16 days

It’s official: I’m running for Congress to represent San Francisco! I’ll fight Trump’s takeover, for our values, & for real progress. I’ve delivered on housing, healthcare, clean energy, and civil rights – and I’ll do it again. Let’s build the future our city & country deserve.

2K

195

2K

Samuel Marks

@saprmarks

1 month

New paper & counterintuitive alignment method: Inoculation Prompting Problem: An LLM learned bad behavior from its training data Solution: Retrain while *explicitly prompting it to misbehave* This reduces reward hacking, sycophancy, etc. without harming learning of capabilities

15

71

536

Sam Bowman

@sleepinyourhat

23 days

🧵 Haiku 4.5 🧵 Looking at the alignment evidence, Haiku is similar to Sonnet: Very safe, though often eval-aware. I think the most interesting alignment content in the system card is about reasoning faithfulness…

2

11

70

Openτensor Foundaτion

@opentensor

2 days

1/ Major changes were pushed to Subtensor mainnet. Let’s dig in.

12

103

405

Sam Bowman

@sleepinyourhat

1 month

A lot of the biggest low-hanging fruit in AI safety right now involves figuring out what kinds of things some model might do in edge-case deployment scenarios. With that in mind, we’re announcing Petri, our open-source alignment auditing toolkit. (🧵)

11

30

226

Senator Scott Wiener

@Scott_Wiener

1 month

BREAKING: Governor @GavinNewsom just signed our groundbreaking AI bill, SB 53, to promote AI innovation (creating a public cloud called CalCompute), require transparency around AI lab safety practices & protect whistleblowers at AI labs who report risk of catastrophic harm. 🧵

75

51

304

Jack Lindsey

@Jack_W_Lindsey

1 month

Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)

43

174

1K

Jack Clark

@jackclarkSF

2 months

Anthropic is endorsing SB 53, California Sen. @Scott_Wiener ‘s bill requiring transparency of frontier AI companies. We have long said we would prefer a federal standard. But in the absence of that this creates a solid blueprint for AI governance that cannot be ignored.

20

35

334

Ethan Perez

@EthanJPerez

2 months

We’re hiring someone to run the Anthropic Fellows Program! Our research collaborations have led to some of our best safety research and hires. We’re looking for an exceptional ops generalist, TPM, or research/eng manager to help us significantly scale and improve our collabs 🧵

10

42

257

Harry Booth

@HarryBooth59643

2 months

EXCLUSIVE: 60 U.K. Parliamentarians Accuse Google of Violating International AI Safety Pledge. The letter, released on August 29 by activist group @PauseAI UK, says that Google’s March release of Gemini 2.5 Pro without details on safety testing “sets a dangerous precedent.”

13

53

176

Benzinga

@Benzinga

2 days

reAlpha CEO Mike Logozzo (@mike_logozzo) joins @marketopolis_ to share how he’s rethinking the homebuying journey with AI, vertical integration, and a retail-first strategy. He breaks down what it takes to pivot, adapt, and lead through change. 🎧 Listen now! @reAlpha

16

8

15

Ethan Perez

@EthanJPerez

2 months

Anthropic safety teams will be supervising (and hiring) collaborators from this program. We’ll be taken on collaborators to start on safety research projects with us starting in January. Also a great opportunity to work with safety researchers at many other great orgs too!

🚀Henry is launching the Astra Research Program!

@sleight_henry

2 months

🚀 Applications now open: Constellation's Astra Fellowship 🚀 We're relaunching Astra — a 3-6 month fellowship to accelerate AI safety research & careers. Alumni @eli_lifland & Romeo Dean co-authored AI 2027 and co-founded @AI_Futures_ with their Astra mentor @DKokotajlo!

1

2

54

Wojciech Zaremba

@woj_zaremba

2 months

It’s rare for competitors to collaborate. Yet that’s exactly what OpenAI and @AnthropicAI just did—by testing each other’s models with our respective internal safety and alignment evaluations. Today, we’re publishing the results. Frontier AI companies will inevitably compete on

openai.com

OpenAI and Anthropic share findings from a first-of-its-kind joint safety evaluation, testing each other’s models for misalignment, instruction following, hallucinations, jailbreaking, and more—hig...

112

387

2K

Jan Leike

@janleike

3 months

If you want to get into alignment research, imo this is one of the best ways to do it. Some previous fellows did some of the most interesting research I've seen this year and >20% ended up joining Anthropic full-time. Application deadline is this Sunday!

Anthropic

@AnthropicAI

3 months

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

17

18

345

Logan Graham

@logangraham

3 months

Launching now — a new blog for research from @AnthropicAI’s Frontier Red Team and others. > https://t.co/lRNZmquFBi We’ll be covering our internal research on cyber, bio, autonomy, national security and more.

26

122

946

Anthropic

@AnthropicAI

3 months

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

68

237

3K

Anthropic

@AnthropicAI

4 months

New Anthropic research: Building and evaluating alignment auditing agents. We developed three AI agents to autonomously complete alignment auditing tasks. In testing, our agents successfully uncovered hidden goals, built safety evaluations, and surfaced concerning behaviors.

62

198

1K

Samuel Marks

@saprmarks

4 months

xAI launched Grok 4 without any documentation of their safety testing. This is reckless and breaks with industry best practices followed by other major AI labs. If xAI is going to be a frontier AI developer, they should act like one. 🧵

261

241

3K