2028: Two scenarios for global AI leadership
…Opportunities for engagement on AI safety Anthropic supports international AI safety dialogue with AI experts in China, when possible. The world has a vested interest in safe AI, regardless of where it…
As noted above, we have deployed the classifier as an experimental addition to our Safeguards framework, monitoring a percentage of Claude traffic. Its real-world performance has confirmed that the classifier works effectively beyond our testing environment. Whereas our synthetic test data provided clear examples of harmful and benign exchanges, the distribution of actual user traffic proved more complex and surprising, yet the classifier still performed well. One example of how real-world deployment differs from testing is that the classifier flagged certain conversations about nuclear weapon
Developing Nuclear Safeguards for AI…Opportunities for engagement on AI safety Anthropic supports international AI safety dialogue with AI experts in China, when possible. The world has a vested interest in safe AI, regardless of where it…
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
…the risks—and benefits—of AI models for cybersecurity. We stated that we would keep Claude Mythos Preview’s release limited and test new cyber safeguards on less capable models first. Opus…
…It is calculated based on the actor's activity across Claude.ai, Claude Code, and our API, drawing on our safety classifiers alongside open-source and internal threat-intelligence indicators. The higher…
…May 25, 2026, Pope Leo XIV released an encyclical on the topic of AI: "Magnifica humanitas: On safeguarding the human person in the time of artificial Intelligence." Anthropic co-founder Chris Olah…
…Training models to recognize and act on their own uncertainty is an important safety property that complements external safeguards like permission systems and human oversight. At Anthropic, we train Claude to ask…
…around Fable 5’s safeguards. Anthropic disputes the “jailbreak” label, calling it a narrow, already-patched issue rather than a wholesale defeat of the model’s safety measures. The result was the…
…Related Anthropic just dropped its core AI safety promise, and that should worry you History doesn't repeat itself, but AI companies sure do. Why is Anthropic keeping Mythos under wraps? For…
…As we build fiduciary-grade AI systems for legal and tax professionals, advances like these help raise the standard for trusted AI performance in real-world workflows. Claude Opus 4.8 sets…
…Why we’re doing this Building safe, beneficial AI models requires deep technical work on alignment, interpretability, safeguards, evaluations, and more. But that work isn’t conducted—nor is AI deployed—in…