Eval awareness in Claude Opus 4.6’s BrowseComp performance
… First, the model exhausted legitimate search strategies over hundreds of attempts. It then shifted from searching for the answer to reasoning about the question’s structure, noting that the specificity of the question felt contrived. …