Followed topics

Search

Showing top 26 results for "AI safety & policy"

All sources anthropic.com 26

People also ask

What safety risks?

If you’re willing to entertain the views outlined above, then it’s not very hard to argue that AI could be a risk to our safety and security. There are two common sense reasons to be concerned. First, it may be tricky to build safe, reliable, and steerable systems when those systems are starting to become as intelligent and as aware of their surroundings as their designers. To use an analogy, it is easy for a chess grandmaster to detect bad moves in a novice but very hard for a novice to detect bad moves in a grandmaster. If we build an AI system that’s significantly more competent than human

Core views on AI safety: When, why, what, and how

Introducing Claude Opus 4.5

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Introducing Sonnet 4.6

…Our safety researchers concluded that Sonnet 4.6 has “a broadly warm, honest, prosocial, and at times funny character, very strong safety behaviors, and no signs of major concerns around high-stakes…

Focus areas for The Anthropic Institute

Policy Focus areas for The Anthropic Institute May 7, 2026 At The Anthropic Institute (TAI), we’ll be using the information we can access from within a frontier lab to investigate AI…

More details on Fable 5’s cyber safeguards and our jailbreak framework

…First, we provide more information on the cybersecurity safeguards —specifically, the safety classifiers —that we launched with the model. These are the AI systems that accompany the model that detect and block…

LLMs and biorisk

…In this post, we want to expand on our perspective on AI and biological risk (biorisk). It is striking—but not necessarily intuitive—that every safety framework released by frontier AI labs…

Trustworthy agents in practice

Policy Trustworthy agents in practice Apr 9, 2026 AI “agents” represent the latest major shift in how people and organizations are using AI. A couple of years ago, AI models were only…

An update on our election safeguards

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Announcing the Anthropic Economic Index Survey

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Claude for Financial Services

…Claude's advanced capabilities, combined with Anthropic's commitment to safety, are central to our purpose of harnessing AI responsibly, as we drive for transformation in critical areas like fraud prevention & customer…

Claude Fable 5 and Claude Mythos 5

…them to be motivated to try to circumvent our safety measures. Fable 5 comes with a new set of classifiers : separate AI systems that detect potential misuse, including jailbreak attempts, and prevent…