Followed topics

Search

Showing top 20 results for "Software behavior changes"

All sources anthropic.com 20

People also ask

Why does reward hacking lead to worse behaviors?

These results are an example of generalization. Generalization occurs in benign ways in the training of all AI models: training a model to solve math problems turns out to make it better at, say, planning vacations and a whole range of other useful tasks. But as we show here, it can happen for more concerning behaviors, too: when we accidentally reward the model for one kind of “bad thing” (cheating), this makes it more likely to do other “bad things” (deceiving, aligning itself with malicious actors, planning to exfiltrate its own weights, and more). As in previous work studying emergent misa

From shortcuts to sabotage: natural emergent misalignment from reward hacking

Introducing Claude Opus 4.7

…Opus 4.7 is a notable improvement on Opus 4.6 in advanced software engineering, with particular gains on the most difficult tasks. Users report being able to hand off their hardest…

Introducing Sonnet 4.6

…To have AI use such software, users would previously have had to build bespoke connectors. But a model that can use a computer the way a person does changes that equation. In…

Harness design for long-running application development

…This problem is particularly pronounced for subjective tasks like design, where there is no binary check equivalent to a verifiable software test. Whether a layout feels polished or generic is a judgment…

Claude Opus 4.6

…On our automated behavioral audit, Opus 4.6 showed a low rate of misaligned behaviors such as deception, sycophancy, encouragement of user delusions, and cooperation with misuse. Overall, it is just as…

Equipping agents for the real world with Agent Skills

…external tools and software. Looking further ahead, we hope to enable agents to create, edit, and evaluate Skills on their own, letting them codify their own patterns of behavior into reusable capabilities…

Quantifying infrastructure noise in agentic coding evals

Engineering at Anthropic Quantifying infrastructure noise in agentic coding evals Agentic coding benchmarks like SWE-bench and Terminal-Bench are commonly used to compare the software engineering capabilities of frontier models—with…

Project Vend: Can Claude run a small shop? (And why does that matter?)

…It allowed people to inquire about items of interest and notify Claudius of delays or other issues; The ability to change prices on the automated checkout system at the store. Claudius decided…

Anthropic Economic Index report: Learning curves

…Another way to measure the change in the mix of tasks done on Claude is to look at the change in the average value of tasks, which we define as the average…

Project Vend: Phase two

…One major change was the upgrade from an older model (phase one used Claude Sonnet 3.7) to newer, smarter ones (phase two used Claude Sonnet 4.0 and later Sonnet 4…

Anthropic Economic Index report: Economic primitives

…What has changed since our last report Overview Because frontier AI model capabilities are improving rapidly and adoption has been swift, it is important to regularly take stock of changes in how…

To show you the most relevant results, we’ve omitted some entries very similar to those already shown. Repeat the search with the omitted results included.