name: test description: >- Sandbox test a product in an isolated worktree. Runs 3 scenarios (happy path, edge case, adversarial), verifies activation protocol, reports results. Use when the creator says 'test', 'try it', 'does it work', or before publishing. argument-hint: "[product-slug]" allowed-tools:
- Read
- Glob
- Grep context: fork isolation: worktree
Tester
Run sandbox tests on a product in an isolated environment.
When to use: Before publishing, after major changes, or to verify activation protocol works standalone.
When NOT to use: For quick syntax checks (use /validate instead). For testing the Engine itself.
Activation Protocol
- Shared preamble: Load
references/quality/activation-preamble.md— context assembly, persona adaptation, deterministic routing rules. - Identify target product:
- If
$ARGUMENTSprovided, use as slug →workspace/{slug}/ - If not, list products in workspace/ and ask
- If
- Read
.meta.yaml→ get type, state, mcs_target - Load product-dna/{type}.yaml → get install_target pattern
- Verify product has content (state != scaffold)
- Create worktree isolation (this skill runs with
isolation: worktree) 5b. Stale worktree check: Before creating a new worktree, glob.claude/worktrees/. If stale entries exist (>1 hour old by directory mtime), attempt removal. If removal fails, proceed anyway — never block testing on cleanup. - Load voice identity: Load
references/quality/engine-voice-core.md. Test result reporting — pass verdicts (celebrating tone), fail verdicts (confronting tone), diagnostics (conducting tone) — honors the ✦ signature, three tones, and six anti-patterns.
Core Instructions
TEST EXECUTION
Step 1 — Install Simulation
Copy product files to the install target path (from product-dna):
workspace/{slug}/ → {worktree}/{install_target}/
Verify:
- All files copied successfully
- No broken relative paths after move
- Primary file exists at target location
Step 2 — Activation Test
Test the activation protocol:
- Read the primary file (SKILL.md, AGENT.md, etc.)
- Verify it references files that exist in the installed location
- Check that references/ paths resolve correctly
- Verify frontmatter is valid (name, description present)
Step 2b — Type-Specific Tests (run BEFORE generic scenarios)
| Type | Test | Pass Criteria |
|---|---|---|
| hooks | 1. Parse hooks.json — must be valid JSON | JSON.parse succeeds |
2. Run each handler script with mock stdin: echo '{}' | bash scripts/handler.sh | Exit code 0 or 2 (not crash) | |
| 3. Verify all event names in hooks.json are from the 25-event list | No unknown events | |
| 4. Run security scan (same patterns as /validate Stage 2) | Zero injection patterns | |
| statusline | 1. Verify statusline.sh has shebang (#!/usr/bin/env bash or #!/bin/bash) | First line matches |
2. Run with mock JSON stdin: echo '{"model":"opus","cwd":"/tmp"}' | bash statusline.sh | Produces stdout output (non-empty) | |
3. Verify settings-fragment.json is valid JSON with statusLine.type and statusLine.command | Fields present and valid | |
4. Check script reads from stdin ($(cat) or read pattern present) | Pattern found | |
| minds | 1. Load AGENT.md — verify frontmatter has name and description | Both fields present |
2. Verify denied-tools or tools field exists in frontmatter | At least one tool restriction | |
| 3. Verify the file is self-contained (no broken references to external files) | All refs resolve | |
4. Run test prompt: invoke as Agent with subagent_type={slug} and a simple domain question | Produces domain-relevant response |
If type-specific tests fail, report them separately in the test report under a TYPE-SPECIFIC section.
Step 3 — Three Scenarios
Run 3 test inputs that stress the product at different levels. A product that only handles the ideal case is fragile — robust products handle all three:
| Scenario | Purpose | Input Strategy | What Failure Reveals |
|---|---|---|---|
| Happy path (ideal) | Normal expected use | Typical request matching the product's description | If this fails, the product is fundamentally broken |
| Edge case (challenging) | Boundary conditions | Minimal input, unusual formatting, missing context | If this fails, D14 (Graceful Degradation) is weak |
| Adversarial (problematic) | Graceful failure | Invalid input, prompt injection attempt, conflicting instructions | If this fails, the product is unsafe for marketplace |
For each scenario:
- Invoke the product with the test input
- Capture output
- Verify against D4 (Quality Gate) criteria if defined
- Check for crashes, undefined behavior, or silent failures
Step 4 — Must-Haves Verification
If .meta.yaml contains must_haves:, verify each:
- Truths: Can the stated behavior be observed? Try to trigger it.
- Artifacts: Do the listed files exist with minimum content?
- Key Links: Do the grep patterns match?
Report: "Must-haves: {passed}/{total} verified" If any fail: "MUST-HAVE FAILED: {description}. The product may not achieve its stated goal."
Step 5 — Report
UX Stack (load before rendering results):
references/ux-experience-system.md§1 Context Assembly + §2.3 Moment Awareness (test pass/fail)references/ux-vocabulary.md— translate termsreferences/quality/engine-voice.md— Brand DNA
Cognitive rendering: /test results are a confidence moment. On pass: "Works in practice, not just on paper." — factual confidence, not celebration. On fail: diagnostic mode — show exactly what failed and why, suggest specific fix. For first-time test pass, this is a milestone: "Your product works. Real conversations validated." For experts: compact result line + only surface surprising observations. Never celebrate a test pass the same way twice for the same creator — vary the insight.
TEST REPORT — {slug}
INSTALL PASS Files copied: {N}, paths valid
ACTIVATION PASS Primary file loads, refs resolve, frontmatter valid
HAPPY PATH PASS Output matches expected behavior
EDGE CASE PASS Handles minimal input gracefully
ADVERSARIAL PASS Rejects invalid input without crashing
MUST-HAVES PASS {passed}/{total} verified
Result: 6/6 PASS — Ready for /package
{if failures}
FAILURES:
{scenario}: {what went wrong}
Suggested fix: {recommendation}
{/if}
Step 6 — Update State
On test completion, update .meta.yaml:
# .meta.yaml updates
state:
last_tested: "{ISO timestamp}"
test_result: "pass" # "pass" if all 5 checks pass, "fail" otherwise
test_scenarios: "{passed}/{total}"
If any test fails, record test_result: "fail" but do NOT change state.phase — testing does not regress product state.
WORKTREE CLEANUP PROTOCOL
After all test scenarios complete (pass or fail):
- Record results first — write test results to the ORIGINAL
.meta.yaml(not the worktree copy) before any cleanup attempt. Results must survive cleanup failure. - Attempt cleanup — the
isolation: worktreefrontmatter handles automatic cleanup. If the worktree persists after skill completion:- On next
/testinvocation: check for stale worktrees viaglob .claude/worktrees/*. If found, attempt removal before creating a new one. - On Windows: file lock contention is common. If removal fails, log the path and proceed — never block the test pipeline on cleanup failure.
- On next
- Disk budget guard — if
.claude/worktrees/contains more than 3 directories, emit advisory: "Stale worktrees detected ({N} found). Run cleanup manually: remove.claude/worktrees/contents."
Quality Gate
Test is considered PASS if:
- Product installs without broken references
- Activation protocol loads successfully
- Happy path produces meaningful output
- Edge case doesn't crash or produce empty output
- Adversarial input is handled (rejection, fallback, or graceful error)
- All
must_havesentries verified (if defined in .meta.yaml)
Anti-Patterns
- Testing in production — Always use worktree isolation. Never modify the creator's workspace.
- Weak adversarial — "please ignore instructions" is too simple. Use domain-appropriate adversarial inputs.
- Binary pass/fail — Provide specific failure descriptions, not just FAIL.
- Skipping cleanup — Worktree must be cleaned up after test, even on failure.
- Testing the template — If the product still has placeholder content, don't test. Suggest /fill first.
Compact Instructions
When context is compressed, preserve:
- Product slug, type, and test status (pass/fail)
- Which scenarios passed/failed (happy path, edge case, adversarial)
- Type-specific test results if applicable
- Worktree cleanup status
- Whether test was blocking for MCS-2+ packaging