Building Effective AI Agents
Discover how Anthropic approaches the development of reliable AI agents. Learn about our research on agent capabilities, safety considerations, and technical framework for building trustworthy AI.
Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T
Teaching Claude whyDiscover how Anthropic approaches the development of reliable AI agents. Learn about our research on agent capabilities, safety considerations, and technical framework for building trustworthy AI.
…That being said, the QA agent still caught real gaps. In its first-round feedback, it noted: This is a strong app with excellent design fidelity, solid AI agent, and good backend…
…a trusted method of confirming whether an AI agent’s output actually achieves its goal. Task verifiers give the agent real-time feedback as it explores a codebase, allowing it to iterate…
…Two scenarios for global AI leadership Our views on the AI competition between the US and China. Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders…
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
…Two scenarios for global AI leadership Our views on the AI competition between the US and China. Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders…
…Two scenarios for global AI leadership Our views on the AI competition between the US and China. Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders…
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.