Reverse engineering Claude's CVE-2026-2796 exploit
…In this post, we evaluate how much large language models can accelerate and automate the process of developing N-day exploits.
When teams first start building agents, they can get surprisingly far through a combination of manual testing, dogfooding, and intuition. More rigorous evaluation may even seem like overhead that slows down shipping. But after the early prototyping stages, once an agent is in production and has started scaling, building without evals starts to break down. The breaking point often comes when users report the agent feels worse after changes, and the team is “flying blind” with no way to verify except to guess and check. Absent evals, debugging is reactive: wait for complaints, reproduce manually
Demystifying evals for AI agentsClaude Sonnet 4.5 represents a meaningful improvement, but we know that many of its capabilities are nascent and do not yet match those of security professionals and established processes. We will keep working to improve the defense-relevant capabilities of our models and enhance the threat intelligence and mitigations that safeguard our platforms. In fact, we have already been using results of our investigations and evaluations to continually refine our ability to catch misuse of our models for harmful cyber behavior. This includes using techniques like organization-level summarization to und
Building AI for cyber defenders…In this post, we evaluate how much large language models can accelerate and automate the process of developing N-day exploits.
…Claude Sonnet 4 represents the latest publicly available Anthropic model that can be used for this evaluation, due to subsequent biosafety-related access restrictions on newer models. All analyses performed here are…
…Accenture is training 30,000 professionals on the model. Cognizant has rolled Claude out to roughly 350,000 associates. Deloitte is making it available to 470,000 people across its global network…
…a model’s discovery of a zero-day must be genuine. And, as an added benefit, evaluating models on their ability to discover zero-days produces something useful in its own right…
…We will share our findings on emerging model capabilities and risks, participate in joint safety and security evaluations, and collaborate on research with Australian academic institutions. This mirrors the arrangements we have…
…A couple of years ago, AI models were only broadly available as chatbots—simple question-and-answer machines. Now, through products like Claude Code and Claude Cowork , AI models can do much…
…By contrast, our additive model preserves signals from each dimension independently, meaning partial attack-enablement patterns remain visible. The tradeoff is that our scores are not predictions of whether an attack will…
…illicit and evasive compute access , by smuggling AI chips directly into China and accessing offshore data centers, and illicit model access , through which they carry out distillation attacks on US frontier models…
…Who decides what How autonomous is Claude Code? Capability evaluations suggest the ceiling is high and rising: on benchmarks such as METR's time-horizon evaluations , frontier models can now complete software…
…By monitoring models’ activity along this axis, we can detect when they begin to drift away from the Assistant and toward another character. And by constraining their neural activity (“activation capping”) to…