Eval awareness in Claude Opus 4.6’s BrowseComp performance
Engineering at Anthropic Eval awareness in Claude Opus 4.6’s BrowseComp performance BrowseComp is an evaluation designed to test how well models can find hard-to-locate information on the web…