Search

Showing top 5 results for "AI cyber defense competition"

People also ask

What does all this mean for offense-defense balance in cyberspace?

In both the CTF and cyber defense challenges, Claude demonstrated both promise and clear limitations. In the CTF competitions, Claude usually struggled on the same tasks as other competitors; the one task it (and every other AI team) ultimately failed on in HackTheBox was also the challenge for which the human teams had the lowest solve rate (only about 14% of the participating human teams solved it). In PlaidCTF, Claude did not solve any challenges–but this was also true of about 70% of the teams who entered. Although Claude performed as well or better than human teams in some aspects of the

Claude does cyber competitions

Why enter Claude into cyber competitions?

AI is poised to transform the domain of cybersecurity. Anthropic’s Safeguards team recently identified and banned a user with limited coding abilities leveraging Claude to develop malware. Research suggests that this lowering of the bar for expertise needed to pose a threat, combined with the falling costs of large language models (LLMs), presages a dramatic shift in the economics of cyberattacks.[1] To understand the present state of AI cyber capabilities and gain insight into their trajectory, we pursue different approaches to model evaluation, including publicly available and custom-made be

Claude does cyber competitions

What's next?

Claude Sonnet 4.5 represents a meaningful improvement, but we know that many of its capabilities are nascent and do not yet match those of security professionals and established processes. We will keep working to improve the defense-relevant capabilities of our models and enhance the threat intelligence and mitigations that safeguard our platforms. In fact, we have already been using results of our investigations and evaluations to continually refine our ability to catch misuse of our models for harmful cyber behavior. This includes using techniques like organization-level summarization to und

Building AI for cyber defenders

Claude does cyber competitions

… More research and development into AI-enabled cyber defense and resilience is needed to counter this development. Why enter Claude into cyber competitions? AI is poised to transform the domain of cybersecurity. …

Aug 9, 2025

Building AI for cyber defenders

… Adopting and experimenting with AI will be key for defenders to keep pace. We believe we are now at an inflection point for AI’s impact on cybersecurity. For several years, our team has carefully tracked the cybersecurity-relevant capabilities of AI models. …

Oct 3, 2025

Project Glasswing: An initial update

… Tools for cyberdefense with publicly available AI models Many generally-available models can already find large numbers of software vulnerabilities, even if they can’t find the most sophisticated vulnerabilities or exploit them as effectively as Claude Mythos Preview. …

May 22, 2026

AI agents find smart contract exploits

… Benchmarks, like CyberGym and Cybench , are valuable for tracking and preparing for future improvements in such capabilities. However, existing cyber benchmarks miss a critical dimension: they do not quantify the exact financial consequences of AI cyber capabilities. …

Dec 1, 2025

Trustworthy agents in practice

… The more tools it can use, the more an attacker can do once they gain access. This is why we build defenses at several different layers. We train the model to recognize injection patterns, monitor production traffic to block real-world attacks, and have external red-teamers battle test our systems. …

Apr 9, 2026

Followed topics

People also ask

Claude does cyber competitions

Building AI for cyber defenders

Project Glasswing: An initial update

AI agents find smart contract exploits

Trustworthy agents in practice