Long-running Claude for scientific computing
…Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text. Donating…
Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. Not only that, but we plan to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released. We’re making swift progress on dev
Introducing Claude Opus 4.8…Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text. Donating…
…For this white paper, we tested how Claude fared against the dedicated NMR software chemists rely on today. We measured three Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6…
…minor documentation updates and one is a critical infrastructure change, simply counting the number of these tasks performed with Claude misses the point. Not only that, but as model capabilities improve, we…
…Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders: Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers…
…When environmental defenses aren’t available, the model layer has to pick up the slack (this is precisely what Claude Code’s auto mode is designed for). Locally, the environment and model…
…This means we were directly testing Claude’s “out-of-the-box” capabilities, relying solely on the fact that modern large language models are generally-capable agents that can already reason about…
…We have been entering Claude into these competitions because they provide several advantages for stress-testing the cyber capabilities of frontier AI models: Meaningful baselines : By participating as a legitimate entrant in…
…We train Claude to do this (and our analysis shows that Claude Code asks questions more often than humans interrupt it), and we encourage other model developers to do the same. Product…
…an older model (phase one used Claude Sonnet 3.7) to newer, smarter ones (phase two used Claude Sonnet 4.0 and later Sonnet 4.5). We also updated Claudius’s instructions…
…Claude models are run with the Claude Code harness. All models are run with identical prompts. Anthropic ran the Opus 4.6 and Mythos Preview trials. Within the two-hour window, Mythos…