Followed topics

Search

Showing top 26 results for "Competition with Anthropic"

Related topics: Anthropic

All sources anthropic.com 26

People also ask

Why enter Claude into cyber competitions?

AI is poised to transform the domain of cybersecurity. Anthropic’s Safeguards team recently identified and banned a user with limited coding abilities leveraging Claude to develop malware. Research suggests that this lowering of the bar for expertise needed to pose a threat, combined with the falling costs of large language models (LLMs), presages a dramatic shift in the economics of cyberattacks.[1] To understand the present state of AI cyber capabilities and gain insight into their trajectory, we pursue different approaches to model evaluation, including publicly available and custom-made be

Claude does cyber competitions

AI agents find smart contract exploits

Frontier Red Team AI agents find $4.6M in blockchain smart contract exploits Dec 1, 2025 Winnie Xiao*, Cole Killian* Henry Sleight, Alan Chan Nicholas Carlini, Alwin Peng *MATS and the Anthropic…

Introducing Claude Opus 4.5

…First impressions As our Anthropic colleagues tested the model before release, we heard remarkably consistent feedback. Testers noted that Claude Opus 4.5 handles ambiguity and reasons about tradeoffs without hand-holding…

Coding agents in the social sciences

…We also find suggestive evidence that researchers fear that the immediate benefits of rising paper productivity may come along with field-level costs. Perhaps more papers means congestion and competition for attention…

Estimating AI productivity gains

…Hulten’s theorem states that in a competitive equilibrium without distortions, the contribution to total factor productivity of micro-level productivity gains are proportional to that production factor’s Domar weight to…

Building AI for cyber defenders

…Patching vulnerabilities is a harder task than finding them because the model has to make surgical changes that remove the vulnerability without altering the original functionality. Without guidance or specifications, the model…

Introducing Sonnet 4.6

…We saw this particularly clearly in the Vending-Bench Arena evaluation, which tests how well a model can run a (simulated) business over time—and which includes an element of competition, with…

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.