Search: AI agents and safety

Teaching Claude why

… Thus, after Claude 4, it was clear we needed to improve our safety training and, since then, we have made significant updates to our safety training. …

May 8, 2026

Australian government and Anthropic sign MOU for AI safety and research

Announcements Australian government and Anthropic sign MOU for AI safety and research Mar 31, 2026 Today, Anthropic signed a Memorandum of Understanding with the Australian government to cooperate on AI safety research and support the goals of Australia’s National AI Plan. …

Mar 31, 2026

Claude Opus 4.6

… A step forward on safety These intelligence gains do not come at the cost of safety. …

Feb 5, 2026

Introducing Sonnet 4.6

… We find that Opus 4.6 remains the strongest option for tasks that demand the deepest reasoning, such as codebase refactoring, coordinating multiple agents in a workflow, and problems where getting it just right is paramount. …

Feb 17, 2026

Advancing Claude in healthcare and the life sciences

… A selection of our partners describe their experiences using Claude below: We were drawn to Anthropic's focus on AI safety and Claude's Constitutional AI approach to creating more helpful, harmless, and honest AI systems. …

Jan 11, 2026

PwC is deploying Claude to build technology, execute deals, and reinvent enterprise functions for clients

… AI-native deal-making . PwC is reinventing how it executes deals end-to-end — diligence, value creation, integration — with agents working alongside deal teams. …

May 14, 2026

From shortcuts to sabotage: natural emergent misalignment from reward hacking

… Misaligned models sabotaging safety research is one of the risks we’re most concerned about—we predict that AI models will themselves perform a lot of AI safety research in the near future, and we want to be assured that the results are trustworthy. …

Nov 21, 2025

Trustworthy agents in practice

Policy Trustworthy agents in practice Apr 9, 2026 AI “agents” represent the latest major shift in how people and organizations are using AI. A couple of years ago, AI models were only broadly available as chatbots—simple question-and-answer machines. …

Apr 9, 2026

Introducing Claude Opus 4.5

… Claude Opus 4.5 delivers measurable gains where it matters most : stronger results on our hardest evaluations and consistent performance through 30-minute autonomous coding sessions. Claude Opus 4.5 represents a breakthrough in self-improving AI agents . …

Nov 24, 2025

Core views on AI safety: When, why, what, and how

… Constantly iterating against a source of “ground truth” is usually crucial for scientific progress. In our AI safety research, empirical evidence about AI – though it mostly arises from computational experiments, i.e. AI training and evaluation – is the primary source of ground truth. …

Mar 8, 2023

Followed topics