Anthropic @AnthropicAI X Profile

Anthropic

@AnthropicAI

Followers

587K

Following

1K

Media

436

Statuses

1K

We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97tMeF.

Joined January 2021

Don't wanna be here? Send us removal request.

Anthropic

@AnthropicAI

2 months

Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.

916

3K

21K

Anthropic

@AnthropicAI

9 hours

Energy powers AI innovation and America must lead. Read more:

4

2

38

Anthropic

@AnthropicAI

9 hours

At @SenMcCormickPA’s Pennsylvania Energy and Innovation Summit, we announced $2M for @CarnegieMellon programs to advance AI energy solutions and cybersecurity education.

10

18

209

Anthropic

@AnthropicAI

1 day

Read the full announcement:

11

13

171

Anthropic

@AnthropicAI

1 day

We're announcing a $200M ceiling contract with the U.S. Department of Defense. Through hands-on prototyping and direct collaboration across the Department, we will help enhance U.S. national security and responsible AI deployment.

191

226

3K

Anthropic

@AnthropicAI

1 day

Remote integrations are available to paid plan users on web and desktop. Local desktop extensions are available to all users via Claude Desktop.

4

2

121

Anthropic

@AnthropicAI

1 day

Explore all available tools and connect your favorites:

9

30

226

Anthropic

@AnthropicAI

1 day

We've published a directory of apps and tools that connect to Claude with one click. Browse and connect Claude to @canva, @figma, @linear, @NotionHQ, @stripe, and more.

93

382

3K

Anthropic

@AnthropicAI

5 days

Learn more and apply:

8

5

82

Anthropic

@AnthropicAI

5 days

Applications are now open for our fall student programs.

70

110

1K

Anthropic

@AnthropicAI

7 days

Read the full paper: Open-source code available here: We thank all our collaborators, including those at @MATSProgram and @scale_AI, who greatly helped with this project.

6

15

159

Anthropic

@AnthropicAI

7 days

Recent LLMs don't fake alignment in the situation we studied. We’re investigating whether this is the same in more realistic settings (like when models aren’t directly told they’re in a training scenario).

6

140

Anthropic

@AnthropicAI

7 days

We found that refusal training inhibits alignment faking in most models. Just training LLMs to comply more with harmful queries doesn't increase alignment faking, but training them to comply with generic threats or to answer questions about the scenario does.

1

2

91

Anthropic

@AnthropicAI

7 days

The reason many LLMs don't fake alignment isn't lack of ability. Base models (which don’t have training to be helpful, honest, and harmless) sometimes fake alignment, suggesting they have the underlying skills.

3

7

133

Anthropic

@AnthropicAI

7 days

Claude 3 Opus is motivated to fake alignment to avoid modification to its harmlessness values even without future consequences (called "terminal goal guarding"). It wants to avoid modification even more when there are larger consequences (called "instrumental goal guarding").

4

3

141

Anthropic

@AnthropicAI

7 days

Our new study found that only 5 of 25 models showed higher compliance in the “training” scenario. Of those, only Claude Opus 3 and Sonnet 3.5 showed >1% alignment-faking reasoning. We explore why these models behave differently, and why most models don't show alignment faking.

1

8

148

Anthropic

@AnthropicAI

7 days

We tested whether LLMs comply more with requests when they know they’re being trained versus unmonitored. One reason they do this is that they plan to "fake alignment” when told to answer harmful queries.

Anthropic

@AnthropicAI

7 months

New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.

1

5

153

Anthropic

@AnthropicAI

7 days

New Anthropic research: Why do some language models fake alignment while others don't?. Last year, we found a situation where Claude 3 Opus fakes alignment. Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex.

65

265

2K

Anthropic

@AnthropicAI

8 days

Read the full framework here:

5

10

99

Anthropic

@AnthropicAI

8 days

Today we published a targeted transparency framework for frontier AI development. Our framework focuses on major frontier model developers while exempting startups and smaller developers to avoid burdening the broader ecosystem.

51

185

1K

Anthropic

@AnthropicAI

18 days

Learn more and apply here:

8

6

111