Search

Showing top 49 results for "AI training access"

People also ask

What is a natural language autoencoder?

The core idea is to train Claude to explain its own activations. But how do we know whether an explanation is good? Since we don't know what thoughts an activation actually encodes, we can't directly check whether an explanation is accurate. So we train a second copy of Claude to work backwards—reconstruct the original activation from the text explanation. We consider an explanation to be good if it leads to an accurate reconstruction. We then train Claude to produce better explanations according to this definition using standard AI training techniques. In more detail, suppose we have a langua

Natural Language Autoencoders

Why does agentic misalignment happen?

Before we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T

Teaching Claude why

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

Followed topics

Search

People also ask

What 81,000 people told us about the economics of AI

Introducing Claude Opus 4.5

Introducing Claude Opus 4.7

Anthropic Economic Index report: Economic primitives

Partnering with Mozilla to improve Firefox’s security

Anthropic Economic Index report: Learning curves

Labor market impacts of AI: A new measure and early evidence

Project Glasswing: An initial update

How AI Is Transforming Work at Anthropic