Search

Showing top 27 results for "agent safety focus"

People also ask

Why does agentic misalignment happen?

Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T

Teaching Claude why

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

Followed topics

Search

People also ask

2028: Two scenarios for global AI leadership

Project Vend: Can Claude run a small shop? (And why does that matter?)

How people ask Claude for personal guidance

Harness design for long-running application development

Anthropic forms $200 million partnership with the Gates Foundation

An update on our election safeguards

Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench