What 81,000 people told us about the economics of AI
…Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text. Donating…
The core idea is to train Claude to explain its own activations. But how do we know whether an explanation is good? Since we don't know what thoughts an activation actually encodes, we can't directly check whether an explanation is accurate. So we train a second copy of Claude to work backwards—reconstruct the original activation from the text explanation. We consider an explanation to be good if it leads to an accurate reconstruction. We then train Claude to produce better explanations according to this definition using standard AI training techniques. In more detail, suppose we have a langua
Natural Language AutoencodersBefore we started this research, it was not clear where the misaligned behavior was coming from. Our main two hypotheses were: Our post-training process was accidentally encouraging this behavior with misaligned rewards.This behavior was coming from the pre-trained model and our post-training was failing to sufficiently discourage it. We now believe that (2) is largely responsible. Specifically, at the time of Claude 4’s training, the vast majority of our alignment training was standard chat-based Reinforcement Learning from Human Feedback RLHF data that did not include any agentic tool use. T
Teaching Claude why…Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text. Donating…
…But this result—where an AI model outperforms strong candidates on important technical skills—raises questions about how AI will change engineering as a profession. Our Societal Impacts and Economic Futures research…
…Last week we announced Project Glasswing , highlighting the risks—and benefits—of AI models for cybersecurity. We stated that we would keep Claude Mythos Preview’s release limited and test new cyber…
…4 Augmentation is again dominant on Claude.ai How AI will affect the economy depends not just on the tasks Claude is used for but the way that users access and engage…
Policy Frontier Red Team Partnering with Mozilla to improve Firefox’s security Mar 6, 2026 AI models can now independently identify high-severity vulnerabilities in complex software. As we recently documented, Claude…
…the tasks performed by Claude were judged to be slightly less possible for a human without access to AI. Emergent automation patterns As tasks migrate to the API, they may become more…
…Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text. Donating…
…Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers. In this study, we train Claude to translate its thoughts into human-readable text.
…We recognize that studying AI’s impact at a company building AI means representing a privileged position—our engineers have early access to cutting-edge tools, work in a relatively stable field…
To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.