Search: Real-world use cases

Demystifying evals for AI agents

…Here's what's worked across a range of agent architectures and use cases in real-world deployment. The structure of an evaluation An evaluation (“eval”) is a test for an AI…

Jan 9, 2026

Claude Code auto mode: a safer way to skip permissions

…In this case, the agent understands the user's goal, and is genuinely trying to help, but takes initiative beyond what the user would approve. For example, it uses a credential it…

Mar 25, 2026

Agents for financial services

…Finally, we’re continuing to expand our partner ecosystem with new connectors and an MCP app, so the agents draw on the data financial professionals already use. Connectors give Claude governed, real…

May 5, 2026

Project Fetch: Can Claude train a robot dog?

…In that experiment, AI’s interaction with the real world was mediated by human labor. In this robodog experiment, we took a natural next step and used robots instead of people to…

Nov 12, 2025

Evaluating Claude’s bioinformatics research capabilities with BioMysteryBench

…We’re eager to build even longer-horizon, real-world tasks that push model research capabilities, and to hear creative ideas from others. Send us your interesting benchmarks, innovative uses of AI…

Apr 29, 2026

Introducing Claude Opus 4.5

…of real-world software engineering: Opus 4.5 is available today on our apps, our API, and on all three major cloud platforms. If you’re a developer, simply use claude-opus…

Nov 24, 2025

Claude Opus 4.6

…Both hands-on testing and evals show Claude Opus 4.6 is a meaningful improvement for design systems and large codebases, use cases that drive enormous enterprise value. It also one-shotted…

Feb 5, 2026

Labor market impacts of AI: A new measure and early evidence

…risk, observed exposure , that combines theoretical LLM capability and real-world usage data, weighting automated (rather than augmentative) and work-related uses more heavily AI is far from reaching its theoretical capability…

Mar 5, 2026

Claude for Financial Services

…Tailored onboarding, training, and best practices for rapid value realization. Financial institutions require the highest standards of data protection. By default, your data is not used for training our generative models, maintaining…

Jul 15, 2025

Harness design for long-running application development

…Applications from earlier harnesses often looked impressive but still had real bugs when you actually tried to use them. To catch these, the evaluator used the Playwright MCP to click through the…

Mar 24, 2026

Followed topics