Anthropic Economic Index report: Learning curves
…Model selection The different Claude model classes (Haiku, Sonnet, and Opus) offer tradeoffs in terms of cost, speed, and performance. The Opus class of models uses the most tokens and excels at…
Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. Not only that, but we plan to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released. We’re making swift progress on dev
Introducing Claude Opus 4.8…Model selection The different Claude model classes (Haiku, Sonnet, and Opus) offer tradeoffs in terms of cost, speed, and performance. The Opus class of models uses the most tokens and excels at…
Engineering at Anthropic Equipping agents for the real world with Agent Skills Update: We've published Agent Skills as an open standard for cross-platform portability. (December 18, 2025) As model capabilities…
…What makes some models better at introspection than others? Our experiments focused on Claude models across several generations (Claude 3, Claude 3.5, Claude 4, Claude 4.1, in the Opus, Sonnet…
…We have updated the model cards for both Claude Opus 4.6 and Claude Sonnet 4.6. For the Opus 4.6 multi-agent configuration described in this report, the run we…
Engineering at Anthropic Introducing advanced tool use on the Claude Developer Platform The future of AI agents is one where models work seamlessly across hundreds or thousands of tools. An IDE assistant…
…We focus on Claude Code usage through a command-line interface (CLI), Claude.ai , or the Claude Code desktop app. 4 By tracking how agentic coding usage changes as models get more…
…Participants with access to Claude 4 models—especially Claude Opus 4—received much higher scores and developed plans with substantially fewer critical failures compared to the internet-only control group. Text-based…
…What we learned from this and other analyses directly shapes how we build Claude to prevent such misuse. For example, we’ve updated the classifiers built into Claude to detect the highest…
…reverse-engineered the proof-of-concept exploit that Claude produced, both to verify the result and to update our understanding of the model's emergent capabilities. This blog is structured around what…
…The evaluations we used are intended to elicit particularly egregious misaligned actions that normal Claude models never engage in. One result is unsurprising: the model learns to reward hack. This is to…