Followed topics

Search

Showing top 52 results for "Claude model updates"

Related topics: Claude

All sources anthropic.com 52

People also ask

What’s next?

Users will find Opus 4.8 to be a modest but tangible improvement on its predecessor. There’s still more to be done: we’re working on developing and releasing models that provide many of the same capabilities as Opus at a lower cost. Not only that, but we plan to release a new class of model with even higher intelligence than Opus. As part of Project Glasswing, a small number of organizations are currently using Claude Mythos Preview for cybersecurity work. Models of this capability level require stronger cyber safeguards before they can be generally released. We’re making swift progress on dev

Introducing Claude Opus 4.8

Long-running Claude for scientific computing

…Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers. In this study we train Claude to translate its thoughts into human-readable text. Donating…

Making Claude a chemist

…For this white paper, we tested how Claude fared against the dedicated NMR software chemists rely on today. We measured three Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6…

Estimating AI productivity gains

…minor documentation updates and one is a critical infrastructure change, simply counting the number of these tasks performed with Claude misses the point. Not only that, but as model capabilities improve, we…

Project Vend: Can Claude run a small shop? (And why does that matter?)

…Teaching Claude why New research on how we've reduced agentic misalignment. Natural Language Autoencoders: Turning Claude’s thoughts into text AI models like Claude talk in words but think in numbers…

How we contain Claude across products

…When environmental defenses aren’t available, the model layer has to pick up the slack (this is precisely what Claude Code’s auto mode is designed for). Locally, the environment and model…

LLM-discovered 0 days

…This means we were directly testing Claude’s “out-of-the-box” capabilities, relying solely on the fact that modern large language models are generally-capable agents that can already reason about…

Claude does cyber competitions

…We have been entering Claude into these competitions because they provide several advantages for stress-testing the cyber capabilities of frontier AI models: Meaningful baselines : By participating as a legitimate entrant in…

Measuring AI agent autonomy in practice

…We train Claude to do this (and our analysis shows that Claude Code asks questions more often than humans interrupt it), and we encourage other model developers to do the same. Product…

Project Vend: Phase two

…an older model (phase one used Claude Sonnet 3.7) to newer, smarter ones (phase two used Claude Sonnet 4.0 and later Sonnet 4.5). We also updated Claudius’s instructions…

Measuring LLMs’ ability to develop exploits

…Claude models are run with the Claude Code harness. All models are run with identical prompts. Anthropic ran the Opus 4.6 and Mythos Preview trials. Within the two-hour window, Mythos…