Eval awareness in Claude Opus 4.6’s BrowseComp performance
… Compounding these concerns is the fact that models appear able to use the tools and environments available to them in unexpected ways, as we saw when Claude used our REPL-based search tool to decrypt answers, or when retailers’ persistent links became a way for agents to unintentionally maintain st… …