Search: AI safety actions

Claude Opus 4.6

…model’s ability to surreptitiously perform harmful actions. We also experimented with new methods from interpretability , the science of the inner workings of AI models, to begin to understand why the model…

Feb 5, 2026

Advancing Claude in healthcare and the life sciences

…We were drawn to Anthropic's focus on AI safety and Claude's Constitutional AI approach to creating more helpful, harmless, and honest AI systems. We've consistently been one of the…

Jan 11, 2026

Claude Code auto mode: a safer way to skip permissions

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Mar 25, 2026

Claude Fable 5 and Claude Mythos 5

…them to be motivated to try to circumvent our safety measures. Fable 5 comes with a new set of classifiers : separate AI systems that detect potential misuse, including jailbreak attempts, and prevent…

Jun 9, 2026

Mapping AI-enabled cyber threats: Insights from the LLM ATT&CK Navigator

…This suggests that going from using AI to prepare for a cyberattack to using it to take actions in live network operations is a key marker of high AI enablement. Overall, the…

Jun 3, 2026

Focus areas for The Anthropic Institute

…Economic diffusion Threats and resilience AI systems in the wild AI-driven R&D In Core Views on AI Safety , we wrote that doing effective safety research required close contact with frontier…

May 7, 2026

Experimenting with AI to defend critical infrastructure

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Jan 8, 2026

Widening the conversation on frontier AI

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

May 19, 2026

Emotion concepts and their function in a large language model

…Anthropomorphic reasoning can also provide a useful baseline of comparison for understanding the ways in which models are not human-like, which has important consequences for AI alignment and safety. Toward models…

Apr 2, 2026