Introducing Sonnet 4.6
…They often even prefer it to our smartest model from November 2025, Claude Opus 4.5. Performance that would have previously required reaching for an Opus-class model—including on real-world…
Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. Not only that, but we plan to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released. We’re making swift progress on dev
Introducing Claude Opus 4.8…They often even prefer it to our smartest model from November 2025, Claude Opus 4.5. Performance that would have previously required reaching for an Opus-class model—including on real-world…
…Safeguards Alongside the release of Claude Opus 4.6, we're introducing a new layer of detection to support our Safeguards team in identifying and responding to cyber misuse of Claude. At…
…Our sample covers February 5 to February 12, three months following the release of Claude Opus 4.5 and coincident with the release of Claude Opus 4.6. We first document how…
…Claude models are run with the Claude Code harness. All models are run with identical prompts. Anthropic ran the Opus 4.6 and Mythos Preview trials. Within the two-hour window, Mythos…
…From Opus 4.5 to Opus 4.8, the number of these patches our models could turn into a working PoC jumped from 2 to 11—and Mythos Preview produced a working…
…What makes some models better at introspection than others? Our experiments focused on Claude models across several generations (Claude 3, Claude 3.5, Claude 4, Claude 4.1, in the Opus, Sonnet…
…These updates pair best with Claude Opus 4.7, which is state-of-the-art on financial tasks and leads the industry on Vals AI's Finance Agent benchmark , at 64.37…
…update on our collaboration with Mozilla, in which Claude Opus 4.6 found 22 vulnerabilities in Firefox over the course of two weeks. As part of that work, we evaluated whether Claude…
…Thus, after Claude 4, it was clear we needed to improve our safety training and, since then, we have made significant updates to our safety training. We use agentic misalignment as a…
…If “ambient” behavior is enabled, Claude will proactively keep you updated about whatever it thinks you might need to know. It’ll flag relevant information from across the channels it’s in…