Claude Opus 4.6
…Footnotes [1] The 1M token context window is currently available in beta on the Claude Developer Platform only. [2] Run independently by Artificial Analysis. See here for full methodological details. [3] This…
…Footnotes [1] The 1M token context window is currently available in beta on the Claude Developer Platform only. [2] Run independently by Artificial Analysis. See here for full methodological details. [3] This…
Engineering at Anthropic Eval awareness in Claude Opus 4.6’s BrowseComp performance BrowseComp is an evaluation designed to test how well models can find hard-to-locate information on the web…
…we analyzed the internal mechanisms of Claude Sonnet 4.5 and found emotion-related representations that shape its behavior. These correspond to specific patterns of artificial “neurons” which activate in situations—and…
…Shared state can also artificially inflate performance. For example, in some internal evals we observed Claude gaining an unfair advantage on some tasks by examining the git history from previous trials. If…
…Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text. Donating…
…Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text. Donating…
…Related content Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders: Turning Claude’s thoughts into text AI models like Claude talk in words but think…
…Performed with AI? Evidence from Millions of Claude Conversations," 2025. Hui, Xiang, Oren Reshef, and Luofeng Zhou, "The short-term effects of generative artificial intelligence on employment: Evidence from an online labor…