Partnering with Mozilla to improve Firefox’s security
… From model evaluations to a security partnership In late 2025, we noticed that Opus 4.5 was close to solving all tasks in CyberGym , a benchmark that tests whether LLMs can reproduce known security vulnerabilities. …