Eval awareness in Claude Opus 4.6’s BrowseComp performance
…For the single-agent configurations, we took the more conservative approach of marking flagged problems as incorrect without re-running; details are in the respective model cards. As models become more capable…