
Anthropic
@AnthropicAI
Followers
587K
Following
1K
Media
436
Statuses
1K
We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97tMeF.
Joined January 2021
At @SenMcCormickPA’s Pennsylvania Energy and Innovation Summit, we announced $2M for @CarnegieMellon programs to advance AI energy solutions and cybersecurity education.
10
18
209
Read the full paper: Open-source code available here: We thank all our collaborators, including those at @MATSProgram and @scale_AI, who greatly helped with this project.
6
15
159
We tested whether LLMs comply more with requests when they know they’re being trained versus unmonitored. One reason they do this is that they plan to "fake alignment” when told to answer harmful queries.
New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.
1
5
153