Trustworthy agents in practice
… Defending against attacks Prompt injections are malicious instructions hidden inside the content that an agent is asked to process. …
… Defending against attacks Prompt injections are malicious instructions hidden inside the content that an agent is asked to process. …
… Why the prompt-injection probe matters The transcript classifier's injection defense is structural as it never sees tool results. But the main agent does see tool results, and an injection that hijacks the main agent then has a chance of bypassing the transcript monitor too. …
… This category includes both prompt injection and conventional attacks on the agent's runtime, orchestration layer, or proxy. When building containment and defense systems, we apply defenses to three main components: The environment in which the agent runs. …
… You can find out more about how to mitigate prompt injections and other safety concerns in our API docs . …
… With Opus 4.5, we’ve made substantial progress in robustness against prompt injection attacks, which smuggle in deceptive instructions to fool the model into harmful behavior. …
… On some measures, such as honesty and resistance to malicious “prompt injection” attacks, Opus 4.7 is an improvement on Opus 4.6; in others such as its tendency to give overly detailed harm-reduction advice on controlled substances , Opus 4.7 is modestly weaker. …
… In the coupled design, any untrusted code that Claude generated was run in the same container as credentials—so a prompt injection only had to convince Claude to read its own environment. …