Search

Showing top 43 results for "AI agent safety"

People also ask

Why does agentic misalignment happen?

Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T

Teaching Claude why

The Long-Term Benefit Trust

…Paul Christiano stepped down in April 2024 to take a new role as the Head of AI Safety at the U.S. AI Safety Institute . In January 2026, Kanika Bahl stepped down…

Sep 19, 2023

Long-running Claude for scientific computing

…The premise Most scientists currently using AI agents work in a conversational loop, managing each step of the process on a tight leash. As models have become significantly better at long-horizon…

Mar 23, 2026

Introducing The Anthropic Institute

…Public Policy focuses on the areas where Anthropic has defined priorities and perspectives, including model safety and transparency , energy ratepayer protections , infrastructure investments , export controls , and democratic leadership in AI . Sarah Heck…

Mar 11, 2026

Sydney will become Anthropic’s fourth office in Asia-Pacific

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Mar 10, 2026

Project Vend: Can Claude run a small shop? (And why does that matter?)

…6 Finally, in a world where larger fractions of economic activity are autonomously managed by AI agents, odd scenarios like this could have cascading effects—especially if multiple agents based on similar…

Jun 27, 2025

Anthropic raises $65B in Series H funding at $965B post-money valuation

…This latest funding is expected to advance our safety and interpretability research, expand compute to meet growing demand for Claude, and scale the products and partnerships our customers rely on. “Claude is…

May 28, 2026

How we contain Claude across products

…For governance, observability, and the rest of the stack, see NIST's project on AI agent identity and authorization , the six-agency guidance on adopting agentic AI led by Australia's ACSC…

May 25, 2026

2028: Two scenarios for global AI leadership

…Opportunities for engagement on AI safety Anthropic supports international AI safety dialogue with AI experts in China, when possible. The world has a vested interest in safe AI, regardless of where it…

May 14, 2026

Anthropic’s Long-Term Benefit Trust appoints Vas Narasimhan to Board of Directors

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

Apr 14, 2026

Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

May 4, 2026

Followed topics

People also ask

The Long-Term Benefit Trust

Long-running Claude for scientific computing

Introducing The Anthropic Institute

Sydney will become Anthropic’s fourth office in Asia-Pacific

Project Vend: Can Claude run a small shop? (And why does that matter?)

Anthropic raises $65B in Series H funding at $965B post-money valuation

How we contain Claude across products

2028: Two scenarios for global AI leadership

Anthropic’s Long-Term Benefit Trust appoints Vas Narasimhan to Board of Directors

Building a new enterprise AI services company with Blackstone, Hellman & Friedman, and Goldman Sachs