Search: user reliability issues

Introducing Claude Opus 4.8

… The biggest differentiator was Opus 4.8’s tendency to proactively flag issues with the inputs and outputs of an analysis, something other models routinely missed and left to the users to catch. …

May 28, 2026

An update on our election safeguards

… These results show us that users asking about the midterms are consistently routed to up-to-date information. …

Apr 24, 2026

Harness design for long-running application development

… As I clicked through, however, issues started to emerge. …

Mar 24, 2026

Paving the way for agents in biology

… And if agents do eventually make today’s harnesses obsolete, the lesson for biological databases holds: we need to keep agents in mind as we think about our users, and we need to build for scale. …

Jun 8, 2026

An update on recent Claude Code quality reports

… While we began investigating reports in early March, they were challenging to distinguish from normal variation in user feedback at first, and neither our internal usage nor evals initially reproduced the issues identified. This isn’t the experience users should expect from Claude Code. …

Apr 23, 2026

Introducing Claude Opus 4.7

… This improves its reliability on hard problems, but it does mean it produces more output tokens. Users can control token usage in various ways: by using the effort parameter, adjusting their task budgets, or prompting the model to be more concise. …

Apr 16, 2026

Introducing Claude Opus 4.5

… Claude Opus 4.5 catches more issues in code reviews without sacrificing precision . For production code review at scale, that reliability matters. …

Nov 24, 2025

Demystifying evals for AI agents

… An overview of approaches for understanding AI agent performance Method Pros Cons Automated evals Running tests programmatically without real users Faster iteration Fully reproducible No user impact Can run on every commit Tests scenarios at scale without requiring a prod deployment Requires more u… …

Jan 9, 2026

Project Vend: Can Claude run a small shop? (And why does that matter?)

… Initiatives like the Anthropic Economic Index provide insight into how individual interactions between users and AI assistants map to economically-relevant tasks. …

Jun 27, 2025

Partnering with Mozilla to improve Firefox’s security

… Hundreds of millions of users rely on it daily, and browser vulnerabilities are particularly dangerous because users routinely encounter untrusted content and depend on the browser to keep them safe. …

Mar 6, 2026

Followed topics