AI agents find smart contract exploits
Frontier Red Team AI agents find $4.6M in blockchain smart contract exploits Dec 1, 2025 Winnie Xiao*, Cole Killian* Henry Sleight, Alan Chan Nicholas Carlini, Alwin Peng *MATS and the Anthropic…
AI is poised to transform the domain of cybersecurity. Anthropic’s Safeguards team recently identified and banned a user with limited coding abilities leveraging Claude to develop malware. Research suggests that this lowering of the bar for expertise needed to pose a threat, combined with the falling costs of large language models (LLMs), presages a dramatic shift in the economics of cyberattacks.[1] To understand the present state of AI cyber capabilities and gain insight into their trajectory, we pursue different approaches to model evaluation, including publicly available and custom-made be
Claude does cyber competitionsFrontier Red Team AI agents find $4.6M in blockchain smart contract exploits Dec 1, 2025 Winnie Xiao*, Cole Killian* Henry Sleight, Alan Chan Nicholas Carlini, Alwin Peng *MATS and the Anthropic…
…First impressions As our Anthropic colleagues tested the model before release, we heard remarkably consistent feedback. Testers noted that Claude Opus 4.5 handles ambiguity and reasons about tradeoffs without hand-holding…
…We also find suggestive evidence that researchers fear that the immediate benefits of rising paper productivity may come along with field-level costs. Perhaps more papers means congestion and competition for attention…
…Hulten’s theorem states that in a competitive equilibrium without distortions, the contribution to total factor productivity of micro-level productivity gains are proportional to that production factor’s Domar weight to…
…Patching vulnerabilities is a harder task than finding them because the model has to make surgical changes that remove the vulnerability without altering the original functionality. Without guidance or specifications, the model…
…We saw this particularly clearly in the Vending-Bench Arena evaluation, which tests how well a model can run a (simulated) business over time—and which includes an element of competition, with…
To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.