
Evan Hubinger
@EvanHub
Followers
7K
Following
14K
Media
15
Statuses
543
Head of Alignment Stress-Testing @AnthropicAI. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)
California
Joined May 2010
RT @HarryBooth59643: EXCLUSIVE: 60 U.K. Parliamentarians Accuse Google of Violating International AI Safety Pledge. The letter, released on….
0
58
0
RT @EthanJPerez: Anthropic safety teams will be supervising (and hiring) collaborators from this program. We’ll be taken on collaborators t….
0
2
0
RT @woj_zaremba: It’s rare for competitors to collaborate. Yet that’s exactly what OpenAI and @AnthropicAI just did—by testing each other’s….
0
404
0
RT @logangraham: Launching now — a new blog for research from @AnthropicAI’s Frontier Red Team and others. > We’l….
0
127
0
RT @AnthropicAI: We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding o….
0
214
0
RT @AnthropicAI: New Anthropic research: Building and evaluating alignment auditing agents. We developed three AI agents to autonomously c….
0
196
0
RT @saprmarks: xAI launched Grok 4 without any documentation of their safety testing. This is reckless and breaks with industry best practi….
0
246
0
RT @jackclarkSF: For the last few months I’ve brought up ‘transparency’ as a policy framework for governing powerful AI systems and the com….
0
14
0
RT @AmandaAskell: "Just train the AI models to be good people" might not be sufficient when it comes to more powerful models, but it sure i….
0
27
0
RT @saprmarks: Bad news: Frontier AI systems, including Claude, GPT, and Gemini, sometimes chose egregiously misaligned actions. Silver lin….
0
12
0
RT @aengus_lynch1: After iterating hundreds of prompts to trigger blackmail in Claude, I was shocked to see these prompts elicit blackmail….
0
38
0
RT @AnthropicAI: New Anthropic Research: Agentic Misalignment. In stress-testing experiments designed to identify risks before they cause….
0
600
0
RT @AnthropicAI: New Anthropic Research: A new set of evaluations for sabotage capabilities. As models gain more agentic abilities, we nee….
0
231
0
RT @jackclarkSF: Right now, we know a lot about frontier AI development because companies voluntarily share this information. Going forward….
nytimes.com
The A.I. industry needs to be regulated, with a focus on transparency.
0
44
0
RT @BernieSanders: The CEO of Anthropic (a powerful AI company) predicts that AI could wipe out HALF of entry-level white collar jobs in th….
0
497
0
RT @BarackObama: At a time when people are understandably focused on the daily chaos in Washington, these articles describe the rapidly acc….
axios.com
Hardly anyone is paying attention.
0
9K
0
RT @AndrewCurran_: This is the full text of the letter Senators Elizabeth Warren and Jim Banks wrote to Jensen Huang expressing national se….
0
19
0
RT @AnthropicAI: Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-s….
0
580
0
RT @kyliebytes: here's what @DarioAmodei said about President Trump’s megabill that would ban state-level AI regulation for 10 years https:….
0
16
0