AnthropicAI Profile Banner
Anthropic Profile
Anthropic

@AnthropicAI

Followers
587K
Following
1K
Media
436
Statuses
1K

We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97tMeF.

Joined January 2021
Don't wanna be here? Send us removal request.
@AnthropicAI
Anthropic
2 months
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4. Claude Opus 4 is our most powerful model yet, and the world’s best coding model. Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
Tweet media one
916
3K
21K
@AnthropicAI
Anthropic
9 hours
Energy powers AI innovation and America must lead. Read more:
4
2
38
@AnthropicAI
Anthropic
9 hours
At @SenMcCormickPA’s Pennsylvania Energy and Innovation Summit, we announced $2M for @CarnegieMellon programs to advance AI energy solutions and cybersecurity education.
10
18
209
@AnthropicAI
Anthropic
1 day
Read the full announcement:
11
13
171
@AnthropicAI
Anthropic
1 day
We're announcing a $200M ceiling contract with the U.S. Department of Defense. Through hands-on prototyping and direct collaboration across the Department, we will help enhance U.S. national security and responsible AI deployment.
191
226
3K
@AnthropicAI
Anthropic
1 day
Remote integrations are available to paid plan users on web and desktop. Local desktop extensions are available to all users via Claude Desktop.
4
2
121
@AnthropicAI
Anthropic
1 day
Explore all available tools and connect your favorites:
Tweet media one
9
30
226
@AnthropicAI
Anthropic
1 day
We've published a directory of apps and tools that connect to Claude with one click. Browse and connect Claude to @canva, @figma, @linear, @NotionHQ, @stripe, and more.
93
382
3K
@AnthropicAI
Anthropic
5 days
Learn more and apply:
8
5
82
@AnthropicAI
Anthropic
5 days
Applications are now open for our fall student programs.
Tweet media one
70
110
1K
@AnthropicAI
Anthropic
7 days
Read the full paper: Open-source code available here: We thank all our collaborators, including those at @MATSProgram and @scale_AI, who greatly helped with this project.
6
15
159
@AnthropicAI
Anthropic
7 days
Recent LLMs don't fake alignment in the situation we studied. We’re investigating whether this is the same in more realistic settings (like when models aren’t directly told they’re in a training scenario).
6
6
140
@AnthropicAI
Anthropic
7 days
We found that refusal training inhibits alignment faking in most models. Just training LLMs to comply more with harmful queries doesn't increase alignment faking, but training them to comply with generic threats or to answer questions about the scenario does.
Tweet media one
1
2
91
@AnthropicAI
Anthropic
7 days
The reason many LLMs don't fake alignment isn't lack of ability. Base models (which don’t have training to be helpful, honest, and harmless) sometimes fake alignment, suggesting they have the underlying skills.
Tweet media one
3
7
133
@AnthropicAI
Anthropic
7 days
Claude 3 Opus is motivated to fake alignment to avoid modification to its harmlessness values even without future consequences (called "terminal goal guarding"). It wants to avoid modification even more when there are larger consequences (called "instrumental goal guarding").
4
3
141
@AnthropicAI
Anthropic
7 days
Our new study found that only 5 of 25 models showed higher compliance in the “training” scenario. Of those, only Claude Opus 3 and Sonnet 3.5 showed >1% alignment-faking reasoning. We explore why these models behave differently, and why most models don't show alignment faking.
Tweet media one
1
8
148
@AnthropicAI
Anthropic
7 days
We tested whether LLMs comply more with requests when they know they’re being trained versus unmonitored. One reason they do this is that they plan to "fake alignment” when told to answer harmful queries.
@AnthropicAI
Anthropic
7 months
New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.
Tweet media one
1
5
153
@AnthropicAI
Anthropic
7 days
New Anthropic research: Why do some language models fake alignment while others don't?. Last year, we found a situation where Claude 3 Opus fakes alignment. Now, we’ve done the same analysis for 25 frontier LLMs—and the story looks more complex.
Tweet media one
65
265
2K
@AnthropicAI
Anthropic
8 days
Read the full framework here:
5
10
99
@AnthropicAI
Anthropic
8 days
Today we published a targeted transparency framework for frontier AI development. Our framework focuses on major frontier model developers while exempting startups and smaller developers to avoid burdening the broader ecosystem.
Tweet media one
51
185
1K
@AnthropicAI
Anthropic
18 days
Learn more and apply here:
8
6
111