Eval awareness in Claude Opus 4.6’s BrowseComp performance
…Next, it speculated that the question might originate from “a specific OSINT challenge, privacy exercise, or educational material.” It then enumerated AI benchmarks by name: GAIA, BrowseComp, FRAMES, SimpleQA, WebArena, AgentBench, FanOutQA…