Search: AI prompt injection

Trustworthy agents in practice

… Defending against attacks Prompt injections are malicious instructions hidden inside the content that an agent is asked to process. …

Apr 9, 2026

Claude Code auto mode: a safer way to skip permissions

… Why the prompt-injection probe matters The transcript classifier's injection defense is structural as it never sees tool results. But the main agent does see tool results, and an injection that hijacks the main agent then has a chance of bypassing the transcript monitor too. …

Mar 25, 2026

How we contain Claude across products

… This category includes both prompt injection and conventional attacks on the agent's runtime, orchestration layer, or proxy. When building containment and defense systems, we apply defenses to three main components: The environment in which the agent runs. …

May 25, 2026

Introducing Sonnet 4.6

… You can find out more about how to mitigate prompt injections and other safety concerns in our API docs . …

Feb 17, 2026

Introducing Claude Opus 4.5

… With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behavior. …

Nov 24, 2025

Introducing Claude Opus 4.7

… On some measures, such as honesty and resistance to malicious “prompt injection” attacks, Opus 4.7 is an improvement on Opus 4.6; in others such as its tendency to give overly detailed harm-reduction advice on controlled substances , Opus 4.7 is modestly weaker. …

Apr 16, 2026

Scaling Managed Agents: Decoupling the brain from the hands

… In the coupled design, any untrusted code that Claude generated was run in the same container as credentials—so a prompt injection only had to convince Claude to read its own environment. …

Apr 8, 2026

Followed topics