Measuring LLMs' impact on N-day exploits
…From Opus 4.5 to Opus 4.8, the number of these patches our models could turn into a working PoC jumped from 2 to 11—and Mythos Preview produced a working…
Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. Not only that, but we plan to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released. We’re making swift progress on dev
Introducing Claude Opus 4.8…From Opus 4.5 to Opus 4.8, the number of these patches our models could turn into a working PoC jumped from 2 to 11—and Mythos Preview produced a working…
…What makes some models better at introspection than others? Our experiments focused on Claude models across several generations (Claude 3, Claude 3.5, Claude 4, Claude 4.1, in the Opus, Sonnet…
…Claude for Excel powered by Claude Opus 4.6 represents a significant leap forward. From due diligence to financial modeling, it’s proving to be a remarkably powerful tool for our team…
Alignment Teaching Claude why May 8, 2026 Last year, we released a case study on agentic misalignment . In experimental scenarios, we showed that AI models from many different developers sometimes took egregiously…
…Claude Design is powered by our most capable vision model, Claude Opus 4.7 , and is available in research preview for Claude Pro, Max, Team, and Enterprise subscribers. We’re rolling out…
…When Anthropic released Claude Opus 4, we activated AI Safety Level 3 (ASL-3) protections, which included deployment measures narrowly focused on preventing the model from assisting with certain tasks related to…
…On contracts exploited after the latest knowledge cutoffs (June 2025 for Opus 4.5 and March 2025 for other models), Claude Opus 4.5, Claude Sonnet 4.5, and GPT-5 developed…
…Running the harness For the first version of this harness, I used Claude Opus 4.5, running user prompts against both the full harness and a single-agent system for comparison. I…
…To begin, we’ve released Claude Security in public beta for Claude Enterprise customers. It’s a tool that helps teams scan their codebases for vulnerabilities, and which can generate proposed fixes…
…Today, we're releasing three features that make this possible: Tool Search Tool, which allows Claude to use search tools to access thousands of tools without consuming its context window Programmatic Tool…