Anthropic
@AnthropicAI
Followers
712K
Following
1K
Media
525
Statuses
1K
We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
Joined January 2021
This research was led by @_igorshilov as part of the Anthropic Fellows Program. https://t.co/O83ndSIXcz
New Anthropic research! We study how to train models so that high-risk capabilities live in a small, separate set of parameters, allowing clean capability removal when needed – for example in CBRN or cybersecurity domains.
4
3
51
Read the full paper on SGTM here: https://t.co/Zfg2tjX7hD For reproducibility, we’ve also made the relevant code available on GitHub: https://t.co/zRmJYy6bDE.
github.com
Training Transformers with knowledge localization (SGTM) - safety-research/selective-gradient-masking
2
4
68
The study had limitations: it was performed in a simplified setup with small models and proxy evaluations rather than standard benchmarks. Also, as with data filtering, SGTM doesn’t stop in-context attacks where an adversary supplies the information themselves.
2
1
29
Unlike unlearning methods that occur after training is complete, SGTM is hard to undo. It takes 7× more fine-tuning steps to recover forgotten knowledge with SGTM compared to a previous unlearning method, RMU.
1
1
33
Controlling for general capabilities, models trained with SGTM perform less well on the undesired “forget” subset of knowledge than those trained with data filtering.
1
1
31
In our study, we tested whether SGTM could remove biology knowledge from models trained on Wikipedia. Data filtering could leak relevant information, since non-biology Wikipedia pages might still contain biology content.
4
0
36
SGTM splits the model’s weights into “retain” and “forget” subsets, and guides specific knowledge into the “forget” subset during pretraining. It can then be removed before deployment in high-risk settings. Read more: https://t.co/BfR4Kd86b0
2
7
57
New research from Anthropic Fellows Program: Selective GradienT Masking (SGTM). We study how to train models so that high-risk knowledge (e.g. about dangerous weapons) is isolated in a small, separate set of parameters that can be removed without broadly affecting the model.
67
140
1K
Anthropic is donating the Model Context Protocol to the Agentic AI Foundation, a directed fund under the Linux Foundation. In one year, MCP has become a foundational protocol for agentic AI. Joining AAIF ensures MCP remains open and community-driven.
anthropic.com
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
239
803
6K
We’re expanding our partnership with @Accenture to help enterprises move from AI pilots to production. The Accenture Anthropic Business Group will include 30,000 professionals trained on Claude, and a product to help CIOs scale Claude Code. Read more:
anthropic.com
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
51
91
922
In her first Ask Me Anything, @amandaaskell answers your philosophical questions about AI, discussing morality, identity, consciousness, and more. Timestamps: 0:00 Introduction 0:29 Why is there a philosopher at an AI company? 1:24 Are philosophers taking AI seriously? 3:00
134
304
3K
Anthropic Interviewer will allow us to conduct more research, more often, on a variety of topics. We plan to use it for regular studies on the evolving relationship between humans and AI. Full results here:
anthropic.com
What 1,250 professionals told us about working with AI
5
9
153
We also looked at the intensity of the most common emotions expressed in interviews. Across the general workforce, we found extremely consistent patterns of high satisfaction, but also frustration in implementing AI.
6
2
134
We visualized patterns across topics. Most workers felt optimistic about the role of AI in work: on productivity, communication, and how they're adapting to a future in which AI is more integrated. But some topics, like reliability, gave pause.
1
7
129
Creatives are anxious about job security. Many use AI for productivity, but face stigma for doing so, and sometimes hide it. Scientists want AI research partners, but currently confine their use to tasks like writing manuscripts or debugging code.
1
1
110
The general workforce wants to delegate routine work to AI, but preserve the tasks central to their professional identity. As one pastor told us: “If I use AI and up my skills with it, it can save me so much time on the admin side, which will free me up to be with the people”
3
4
143
We tested it by asking 1,250 professionals about their views on work and AI. Our largest sample was from the general workforce. We also recruited subgroups of creatives and scientists—domains where AI's role is contested and rapidly evolving. https://t.co/q2OG268GH9
anthropic.com
What 1,250 professionals told us about working with AI
3
4
151
Give Anthropic Interviewer a research goal, and it drafts research questions, conducts interviews, and analyzes responses in collaboration with a human researcher.
5
15
259
We’re launching Anthropic Interviewer, a new tool to help us understand people’s perspectives on AI. It’s now available at https://t.co/W8P36sPQBy for a week-long pilot.
claude.ai
Talk with Claude, an AI assistant from Anthropic
163
418
3K
Anthropic CEO Dario Amodei spoke today at the New York Times DealBook Summit. "We're building a growing and singular capability that has singular national security implications, and democracies need to get there first."
102
146
2K