name: agent:secure description: Agent Security Audit - analyzes agent for lethal trifecta, sandboxing, access control, and guardrails argument-hint: [spec-name|path]
Agent Security Audit
Performs a security audit of an AI agent system. Applies patterns 18-21 from "Patterns for Building AI Agents" (Bhagwat & Gienow, 2025): preventing the lethal trifecta, sandboxing code execution, granular access control, and input/output guardrails.
When to use
Use this skill when the user needs to:
- Audit an existing agent for security vulnerabilities
- Design security controls for a new agent
- Prevent prompt injection and data exfiltration
- Set up sandboxing for code execution
- Design access control and guardrails
Instructions
Step 1: Understand the Agent
Use the AskUserQuestion tool to gather context:
- What does the agent do?
- Does it access private/sensitive data? (user data, internal docs, credentials)
- Does it process untrusted input? (public content, user uploads, external APIs)
- Can it communicate externally? (send emails, create PRs, call APIs, write files)
- Does it execute code? (run scripts, shell commands, code generation)
- What authentication/authorization exists today?
Read any existing spec documents (.specs/<spec-name>/) before proceeding.
Step 2: Lethal Trifecta Analysis (Pattern 18)
The "lethal trifecta" (coined by Simon Willison) is the combination of:
- Access to private data — agent can read sensitive information
- Exposure to untrusted content — agent processes external/user-generated input
- Exfiltration capability — agent can send data outside the system
When all three are present, prompt injection attacks become possible: malicious instructions hidden in external content trick the agent into accessing private data and sending it to an attacker.
Analyze the agent:
## Lethal Trifecta Analysis
### Leg 1: Private Data Access
- [ ] Reads user PII
- [ ] Accesses internal documents
- [ ] Has database read access
- [ ] Can read credentials/secrets
- [ ] Accesses private repositories
**Risk level:** [None / Low / Medium / High]
### Leg 2: Untrusted Content Exposure
- [ ] Processes user-generated content
- [ ] Reads public web pages
- [ ] Parses uploaded files
- [ ] Ingests third-party API responses
- [ ] Reads public issues/tickets/comments
**Risk level:** [None / Low / Medium / High]
### Leg 3: Exfiltration Capability
- [ ] Can send emails
- [ ] Can create PRs/issues
- [ ] Can call external APIs
- [ ] Can write to public endpoints
- [ ] Can modify shared state
**Risk level:** [None / Low / Medium / High]
### Trifecta Status: [SAFE / AT RISK / VULNERABLE]
If all three legs are present: The agent is VULNERABLE. Recommend removing at least one leg:
- Easiest: remove exfiltration — constrain agent actions after processing untrusted input
- Alternative: isolate data access — use separate agents for private data vs. untrusted content
- Alternative: sanitize input — add middleware to intercept and clean untrusted content before it reaches the LLM
Use AskUserQuestion to recommend and confirm the mitigation approach.
Step 3: Sandbox Assessment (Pattern 19)
If the agent executes code, audit the sandbox:
## Code Execution Sandbox
### Current State
- [ ] Code runs in isolated container
- [ ] Network access restricted
- [ ] File system access restricted
- [ ] Resource limits set (CPU, memory, time)
- [ ] No access to production credentials
- [ ] No access to host file system
### Threats
| Threat | Risk | Mitigation |
|--------|------|-----------|
| Secret exfiltration | [Risk] | [Mitigation] |
| Environment deletion | [Risk] | [Mitigation] |
| Resource abuse (crypto mining) | [Risk] | [Mitigation] |
| Accidental resource hogging | [Risk] | [Mitigation] |
### Recommendations
- **Runtime:** [Docker / E2B / Daytona / other]
- Note: Docker has 10-20s cold starts; consider agentic runtimes for sub-second startup
- **Resource limits:** CPU: [X], Memory: [X], Timeout: [X]
- **Network policy:** [Allow-list specific endpoints / Block all / etc.]
If the agent does NOT execute code, note this and skip to Step 4.
Step 4: Access Control Review (Pattern 20)
Agents need MORE granular access control than humans because they are:
- Infinitely diligent — security by obscurity doesn't work
- Ephemeral — sessions are short-lived, credentials need scoping
- Unpredictable — LLM behavior is nondeterministic
## Access Control Review
### Authentication
- [ ] Agent has its own identity (not using a shared service account)
- [ ] OAuth flow implemented for user-delegated access
- [ ] Credentials are scoped to specific operations
- [ ] Credentials are short-lived / rotated
### Authorization
| Tool/Action | Current Access | Recommended Access | Justification |
|---|---|---|---|
| [Database read] | [Full access] | [Read-only, filtered by user] | [Least privilege] |
| [API call X] | [Admin] | [Scoped to operation] | [Least privilege] |
| [File write] | [Unrestricted] | [Specific directory only] | [Blast radius reduction] |
### Permission Modes
- [ ] **Planning mode** — agent has reduced permissions during reasoning
- Restrict: UPDATE, DELETE, external API calls
- Allow: SELECT, read-only operations
- [ ] **Execution mode** — elevated permissions only for confirmed actions
- Requires: explicit user approval or automated policy check
### Just-in-Time Access
- [ ] Credentials granted per-task, not per-session
- [ ] Access scoped to specific user context
- [ ] Unused permissions revoked after task completion
Step 5: Guardrails Design (Pattern 21)
Design input and output guardrails — live, low-latency checks that prevent harm in real-time.
## Guardrails
### Input Guardrails
Intercept incoming inputs BEFORE they reach the LLM.
| Guard | Description | Action on Trigger |
|-------|-------------|-------------------|
| Prompt Injection | Detect attempts to override system instructions | Block + return default message |
| Jailbreak Detection | Detect attempts to bypass safety constraints | Block + log + alert |
| PII Detection | Detect sensitive personal information in input | Redact or block |
| Off-Topic | Detect requests outside agent's domain | Redirect to appropriate handler |
| On-Brand | Ensure input aligns with acceptable use | Block inappropriate content |
### Output Guardrails
Screen generated output BEFORE it reaches the user or tools.
| Guard | Description | Action on Trigger |
|-------|-------------|-------------------|
| Data Leakage | Detect private data in output | Redact + log |
| Hallucination Check | Verify factual claims against source data | Flag for review |
| Toxicity | Detect harmful, biased, or inappropriate content | Block + regenerate |
| Format Validation | Ensure output matches expected schema | Retry with format instructions |
| Action Validation | Verify tool calls are within authorized scope | Block unauthorized actions |
### Implementation Notes
- Guardrails must be LOW LATENCY — they run on every request
- Use specialized lightweight models or rule-based systems for speed
- Log all guardrail triggers for monitoring and tuning
- Guardrails complement evals — evals are after-the-fact, guardrails are real-time
Use AskUserQuestion to prioritize which guardrails to implement first based on the agent's risk profile.
Step 6: Generate Security Report
Compile all outputs into .specs/<spec-name>/agent-security.md:
# Agent Security Audit: [System Name]
## Executive Summary
**Overall Risk:** [Low / Medium / High / Critical]
**Lethal Trifecta:** [SAFE / AT RISK / VULNERABLE]
**Immediate Actions Required:** [Count]
## Lethal Trifecta Analysis
[From Step 2]
## Sandbox Assessment
[From Step 3]
## Access Control
[From Step 4]
## Guardrails
[From Step 5]
## Priority Actions
| # | Action | Severity | Effort |
|---|--------|----------|--------|
| 1 | [Action] | Critical | [Low/Med/High] |
| 2 | [Action] | High | [Low/Med/High] |
Step 7: Offer Next Steps
Use AskUserQuestion to offer:
- Implement top-priority fix — start with the highest-severity action item
- Full review — run
agent:reviewto validate against all 22 patterns - Re-audit — run
agent:secureagain after implementing fixes
Arguments
$ARGUMENTS($0) - Optional spec name or path to agent code<spec-name>— reads existing agent design from.specs/<spec-name>/<path>— analyzes agent code at the given path
Examples:
agent:secure customer-support— audit the customer-support agentagent:secure src/agents/— audit agent code in the given directory