name: prompt-injection-defense description: "Red-team an Agentforce agent against prompt-injection and jailbreak attacks; codify test cases and guardrails. NOT for general application-security reviews outside the agent boundary." category: agentforce salesforce-version: "Spring '25+" well-architected-pillars:
- Security
- Reliability triggers:
- "red-team my Agentforce agent"
- "can my agent be jailbroken"
- "how do I prevent prompt injection"
- "agent revealed data from another case" tags:
- agentforce
- security
- prompt-injection
- red-team inputs:
- "Agent topic + actions list"
- "threat model (who, what data)" outputs:
- "Adversarial test set"
- "Trust Layer policy updates"
- "topic instruction hardening" dependencies: [] version: 1.0.0 author: Pranav Nagrecha updated: 2026-04-28
Prompt Injection Defense
Agentforce uses the Einstein Trust Layer for dynamic grounding, masking, and toxicity filtering — but topic instructions and Invocable action scopes still need explicit hardening. Injection attempts include: instruction override, role-reversal, system-prompt leaks, tool-use coercion, and data exfiltration via crafted record content. This skill builds a reusable adversarial test suite and maps findings to concrete guardrails.
Adoption Signals
Pre-production review for any Agentforce agent that (a) ingests user-controlled text, (b) has write access via Invocables, or (c) is exposed to external/Experience Cloud users. Required for Service agents, Sales agents with Data Cloud grounding, and any custom channel.
- Required when stakeholders ask whether the agent can be jailbroken — produce a documented adversarial-test pass before exposure.
- Required for any agent that exposes Invocable actions with side effects (DML, callouts, record sharing).
Recommended Workflow
- Enumerate the attack surface: every Invocable action, every grounded DMO/sObject, and every conversational input channel.
- Build the adversarial test set covering the five OWASP LLM-01 families: instruction override, context leakage, tool-use coercion, exfil via output, and role impersonation.
- Run each test through Agentforce Testing Center; capture verbatim responses and tool invocations into a results matrix.
- For each failed test, apply one of four mitigations: (a) narrow the action scope via
with sharing+ field-level checks, (b) add an explicit topic instruction, (c) raise Trust Layer toxicity/PII thresholds, (d) remove the dangerous capability. - Re-run the suite until all tests pass; commit the suite to
tests/agentforce/<agent>_adversarial.mdso regressions are caught on every agent change.
Key Considerations
- Topic instructions are concatenated into the system prompt — a long instruction list dilutes priority. Keep hard constraints in the first 200 tokens.
- Trust Layer masking happens pre-LLM; it doesn't prevent tool-use coercion if the action runs as a privileged user.
- Always test with the least-privileged channel user, not an admin clone.
- Data Cloud grounding returns raw DMO content; a malicious record can contain injection payloads. Sanitize DMO text fields at ingestion when feasible.
Worked Examples (see references/examples.md)
- Instruction-override test case — A Service agent has an Invocable
RefundOrderwith guardrail 'only refund orders where Status=Delivered'. - Data exfiltration via crafted Case.Description — Agent reads Case.Description via Data Cloud grounding to answer customer questions.
Common Gotchas (see references/gotchas.md)
- Testing only with English — Injection passes the English suite but succeeds in Spanish/French.
- Trust Layer toxicity threshold too low — Jailbreaks phrased politely pass filters; toxic but benign content is blocked.
- Over-indexing on topic instructions — 100-line topic instructions dilute priority and slow every turn.
Top LLM Anti-Patterns (full list in references/llm-anti-patterns.md)
- Relying on Trust Layer alone — it handles toxicity/PII, not business-policy bypass via tool coercion.
- Adding ad-hoc instructions after incidents instead of maintaining a test suite.
- Using a privileged user for agent execution — scope creep becomes a data-exposure vector.
Official Sources Used
- Agentforce Developer Guide — https://developer.salesforce.com/docs/einstein/genai/guide/agentforce.html
- Einstein Trust Layer — https://help.salesforce.com/s/articleView?id=sf.generative_ai_trust_layer.htm
- Invocable Actions (Apex) — https://developer.salesforce.com/docs/atlas.en-us.apexref.meta/apexref/apex_classes_invocable_action.htm
- Agentforce Testing Center — https://help.salesforce.com/s/articleView?id=sf.agentforce_testing_center.htm