Search

Showing top 27 results for "model capability progress"

Eval awareness in Claude Opus 4.6’s BrowseComp performance

…For the single-agent configurations, we took the more conservative approach of marking flagged problems as incorrect without re-running; details are in the respective model cards. As models become more capable…

Mar 6, 2026

Australian government and Anthropic sign MOU for AI safety and research

…We will share our findings on emerging model capabilities and risks, participate in joint safety and security evaluations, and collaborate on research with Australian academic institutions. This mirrors the arrangements we have…

Mar 31, 2026

Partnering with Mozilla to improve Firefox’s security

…vulnerability-discovery (and patching) capabilities directly to customers and open-source maintainers. But looking at the rate of progress, it is unlikely that the gap between frontier models’ vulnerability discovery and exploitation…

Mar 6, 2026

Harness design for long-running application development

…Since the model was much more capable, it changed how load-bearing the evaluator was for certain runs, with its usefulness depending on where the task sat relative to what the model…

Mar 24, 2026

The assistant axis: situating and stabilizing the character of large language models

…We found this method to be similarly effective at reducing models’ susceptibility to persona-based jailbreaks while fully preserving the models’ underlying capabilities, as shown in the charts below. Persona drift happens…

Jan 19, 2026

How AI Is Transforming Work at Anthropic

…At the time this data was collected, Claude Sonnet 4 and Claude Opus 4 were the most capable models available, and capabilities have continued to advance. More capable AI brings productivity benefits…

Dec 2, 2025

Anthropic Economic Index report: Economic primitives

…Controlled benchmarks like METR’s measure the frontier of autonomous capability. Our real-world data can measure the effective task horizon, reflecting a mix of model capabilities and user behavior, and expanding…

Jan 15, 2026

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.

Followed topics