Search

Showing top 17 results for "AI safety safeguards"

People also ask

What does AI have to do with dangerous weapons at all?

We worry about how AI might assist malicious actors with weapon acquisition and development both because of how it is similar to historical information and communication technologies and how it is different. In recent years, terrorist groups have rapidly adopted technologies like encrypted communications, cryptocurrency, and social media. We should expect nothing different from AI. Just as those seeking information about how to build weapons shifted from needing to acquire physical pamphlets or manuals to searching the internet, we can expect that they will query AI. What is different, though,

LLMs and biorisk

What’s next?

As noted above, we have deployed the classifier as an experimental addition to our Safeguards framework, monitoring a percentage of Claude traffic. Its real-world performance has confirmed that the classifier works effectively beyond our testing environment. Whereas our synthetic test data provided clear examples of harmful and benign exchanges, the distribution of actual user traffic proved more complex and surprising, yet the classifier still performed well. One example of how real-world deployment differs from testing is that the classifier flagged certain conversations about nuclear weapon

Developing Nuclear Safeguards for AI

… This precision matters because nuclear conversations in AI systems are rare but high-stakes—they bear directly on national security. Sharing with industry We’re making these resources available so that other leading AI companies can implement similar safeguards if they choose. …

Aug 21, 2025

Claude Fable 5 and Claude Mythos 5

… Availability Claude Fable 5 is available everywhere today. Claude Mythos 5 is restricted to Glasswing partners with cyber safeguards lifted and soon to select biology researchers with biology and chemistry safeguards lifted only, until our broader trusted access program is available. …

Jun 9, 2026

LLMs and biorisk

… In this post, we want to expand on our perspective on AI and biological risk biorisk . It is striking—but not necessarily intuitive—that every safety framework released by frontier AI labs includes some reference to biorisk. …

Sep 5, 2025

Expanding Project Glasswing

… In the meantime, we plan to expand Project Glasswing even further—prioritizing additional essential infrastructure providers, maintainers of critical open-source software, and safety testers. …

Jun 2, 2026

An update on our election safeguards

… With safeguards and training in place, our latest models refused nearly every task. Without our safeguards in place which we do to measure a model's raw capabilities , only Mythos Preview and Opus 4.7 completed more than half the tasks. …

Apr 24, 2026

Claude Opus 4.6

… A detailed description of all capability and safety evaluations is available in the Claude Opus 4.6 system card . We’ve also applied new safeguards in areas where Opus 4.6 shows particular strengths that might be put to dangerous as well as beneficial uses. …

Feb 5, 2026

2028: Two scenarios for global AI leadership

… Opportunities for engagement on AI safety Anthropic supports international AI safety dialogue with AI experts in China, when possible. The world has a vested interest in safe AI, regardless of where it is developed and deployed. …

May 14, 2026

Trustworthy agents in practice

… A well-trained model can still be exploited through a poorly configured harness, an overly permissive tool, or an exposed environment. This is why the safeguards we and others build need to account for them all. …

Apr 9, 2026

Introducing Claude Opus 4.7

… Note that Mythos Preview remains the best-aligned model we’ve trained according to our evaluations. Our safety evaluations are discussed in full in the Claude Opus 4.7 System Card . …

Apr 16, 2026

Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator

… It is calculated based on the actor's activity across Claude.ai, Claude Code, and our API, drawing on our safety classifiers alongside open-source and internal threat-intelligence indicators. The higher the score, the higher-risk the AI enabled actor is. …

Jun 3, 2026

Followed topics

People also ask

Developing Nuclear Safeguards for AI

Claude Fable 5 and Claude Mythos 5

LLMs and biorisk

Expanding Project Glasswing

An update on our election safeguards

Claude Opus 4.6

2028: Two scenarios for global AI leadership

Trustworthy agents in practice

Introducing Claude Opus 4.7

Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator