Harness Autopilot
Lightweight orchestrator — dispatches isolated phase-agents, tracks state, chains artifacts between phases. Delegates all planning/execution/verification/review to dedicated persona agents.
When to Use
- After a multi-phase spec is approved and you want automated phase execution
- When a project has 2+ implementation phases requiring repeated skill invocations
- NOT for single-phase work (use harness-execution directly)
- NOT when the spec is not yet approved (use harness-brainstorming first)
- NOT for CI/headless execution (conversational skill)
Persona Agents
| Skill | subagent_type | State(s) |
|---|---|---|
| harness-planning | harness-planner | PLAN |
| harness-execution | harness-task-executor | EXECUTE |
| harness-verification | harness-verifier | VERIFY |
| harness-integration | harness-verifier | INTEGRATE |
| harness-code-review | harness-code-reviewer | REVIEW, FINAL_REVIEW |
Iron Law: Autopilot delegates, never reimplements. If writing plan/execute/verify/review logic, STOP — delegate via subagent_type. Always use dedicated persona agents, never general-purpose agents.
Rigor Levels
Set at INIT (--fast / --thorough); persists for session. Default: standard.
| State | fast | standard | thorough |
|---|---|---|---|
| PLAN | Skip skeleton pass | Default | Always skeleton with approval |
| APPROVE_PLAN | Auto-approve, skip signals | Signal-based | Force human review |
| EXECUTE | Skip scratchpad | Scratchpad >500 words | Verbose scratchpad |
| VERIFY | harness validate only | Full pipeline | Expanded checks |
| INTEGRATE | WIRE only, auto-approve | Full tier-appropriate | Full + human ADR review |
State Machine
INIT → ASSESS → PLAN → APPROVE_PLAN → EXECUTE → VERIFY → INTEGRATE → REVIEW → PHASE_COMPLETE
│
[next phase?]
│ │
ASSESS FINAL_REVIEW → DONE
INIT
- Resolve spec path (argument or prompt).
- Derive session slug: strip
docs/, drop.md, replace/and.with--, lowercase. SetsessionDir = .harness/sessions/<slug>/. - Check for existing state: read
{sessionDir}/autopilot-state.json. If present and not DONE: report "Resuming from{currentState}, phase {N}: {name}." Apply schema migration ifschemaVersion < 5(backfill missing fields). Jump to recorded state. - Fresh start: read spec, parse
## Implementation Orderfor phases (### Phase N: Name+<!-- complexity: low|medium|high -->, default:medium). CapturestartingCommitviagit rev-parse HEAD. Writeautopilot-state.json(schemaVersion: 5, currentState: "ASSESS", currentPhase: 0). - Flags:
--fast→rigorLevel: "fast".--thorough→rigorLevel: "thorough".--review-plans→reviewPlans: true. Both flags together → reject with error. - Call
gather_context({ path, skill: "harness-autopilot", session: slug, include: ["state", "learnings", "handoff", "graph", "businessKnowledge", "sessions", "validation"] }). - → ASSESS.
ASSESS
- Read current phase at
currentPhase. - If
planPathset and file exists: → APPROVE_PLAN. - Intelligence-enhanced complexity assessment. Before routing by complexity, refine the annotation with signals from available tools:
- Run
predict_failureson the phase domain to check if constraints are trending toward violation — high failure probability suggests upgrading complexity. - Run
compute_blast_radiuson files the phase is likely to touch — large blast radius (>15 affected modules) suggests upgrading tohigh. - If the orchestrator is running, request intelligence analysis via
POST /api/analyzewith the phase title/description to get CML complexity scores and PESL simulation results. Use CMLstructuralComplexity > 0.7or PESLriskScore > 0.6as triggers to upgrade complexity routing. - If no orchestrator, the MCP tool signals above are sufficient.
- Run
- Complexity routing:
low/medium: auto-plan via harness-planner → PLAN.high: pause. Instruct: "Run/harness:planninginteractively, then re-invoke/harness:autopilot." Wait for re-invocation.
- Update
currentState: "PLAN".
PLAN
Auto-plan (low/medium): Dispatch harness-planner:
subagent_type: "harness-planner"
prompt: "Phase {N}: {name}. Spec: {specPath}. Session: {sessionSlug}. Rigor: {rigorLevel}. Follow harness-planning. Write plan to docs/changes/<topic>/plans/ (topic from specPath; legacy docs/plans/ if spec is outside docs/changes/). Write {sessionDir}/handoff.json when done."
On return: read planPath from {sessionDir}/handoff.json. Complexity override check: low + tasks>10 or checkpoints>3 → "medium"; tasks>20 or checkpoints>6 → "high". Update state planPath. → APPROVE_PLAN.
Interactive plan (high): Check for plan file at docs/changes/<topic>/plans/*{phase-name}* (or legacy docs/plans/*{phase-name}*) or planPath in handoff. If found: update planPath → APPROVE_PLAN. If not: remind and wait.
APPROVE_PLAN
- Gather: task count, checkpoint count, concerns from
{sessionDir}/handoff.json(default[]). "fast"→ auto-approve, record"auto_approved_plan_fast", → EXECUTE."thorough"→ forceshouldPauseForReview = true.- Signals (any true → pause; all false → auto-approve):
reviewPlans: truephase.complexity === "high"phase.complexityOverride !== null- Handoff
concernsnon-empty - Task count > 15
- Knowledge gaps:
harness knowledge-pipeline --domain <phase-domain>reportstotalGaps > 0and--fixwas not run during planning
- Auto-approve: emit report (mode, complexity, concerns, task count). Record decision with signal snapshot in
decisions[]. → EXECUTE. - Pause: show triggered signals. Ask "Approve? (yes / revise / skip phase / stop)." Record decision. Route accordingly.
EXECUTE
Dispatch harness-task-executor:
subagent_type: "harness-task-executor"
prompt: "Phase {N}: {name}. Plan: {planPath}. Session: {sessionSlug}. Rigor: {rigorLevel}. Update {sessionDir}/state.json per task. Write {sessionDir}/handoff.json when done or blocked."
Checkpoints: [checkpoint:human-verify] → show output, confirm, resume. [checkpoint:decision] → present options, record choice, resume. [checkpoint:human-action] → instruct user, wait for confirmation, resume. After each passing checkpoint: commitAtCheckpoint().
Outcome: All tasks complete → VERIFY. Task fails → retry logic:
- Attempt 1: read error, apply obvious fix, re-dispatch for failed task.
- Attempt 2: expand context — read related files, check
learnings.md, re-dispatch. - Attempt 3: full context — test output, imports, plan instructions, re-dispatch.
- Budget exhausted: recovery commit (
[autopilot][recovery]prefix in message), record in.harness/failures.md. Ask: "fix manually and continue / revise plan / stop."
VERIFY
"fast": runharness validate. Pass → INTEGRATE. Fail → surface to user."standard"/"thorough": dispatch harness-verifier:
Pass → INTEGRATE. Fail → ask "fix / skip verification / stop."subagent_type: "harness-verifier" prompt: "Phase {N}: {name}. Session: {sessionSlug}. Rigor: {rigorLevel}. Verify and report pass/fail with findings."fix: re-enter EXECUTE (retry budget resets).
INTEGRATE
- Resolve tier:
max(plan.integrationTier, derived-from-execution). If tier escalated: notify human with "Tier escalated from{planned}to{derived}: {reason}." - Dispatch harness-integration skill:
subagent_type: "harness-verifier" prompt: "Phase {N}: {name}. Session: {sessionSlug}. Tier: {tier}. Plan: {planPath}. Verify integration per harness-integration skill." - Rigor interaction:
"fast": WIRE sub-phase only, auto-approve, no ADR drafting."standard": Full tier-appropriate checks (WIRE + MATERIALIZE + UPDATE per tier)."thorough": Full checks + human reviews every ADR draft + force knowledge graph verification.
- Pass → REVIEW.
- Fail → report incomplete items. Ask "fix / skip integration / stop":
- fix: re-enter EXECUTE with integration-specific fix tasks, then re-VERIFY, re-INTEGRATE. Retry budget resets.
- skip: record decision in
decisions[], proceed to REVIEW (human override). - stop: save state and exit.
REVIEW
Dispatch harness-code-reviewer:
subagent_type: "harness-code-reviewer"
prompt: "Phase {N}: {name}. Session: {sessionSlug}. Follow harness-code-review. Report findings (critical / important / suggestion)."
Persist findings to {sessionDir}/phase-{N}-review.json. No blocking → PHASE_COMPLETE. Blocking → ask "fix / override / stop." fix: re-enter EXECUTE. override: record decision in decisions[] → PHASE_COMPLETE.
PHASE_COMPLETE
- Present summary: name, tasks completed, retries used, verification result, integration report (
{sessionDir}/phase-{N}-integration.json), review findings count, elapsed time. - Record in
history[]: phase index, name, startedAt, completedAt, tasksCompleted, retriesUsed, verificationPassed, integrationPassed, reviewFindings. - Mark phase
completein state. Clear scratchpad:clearScratchpad({ session, phase, projectPath }). - Sync roadmap:
manage_roadmap sync apply:true(skip if no roadmap; neverforce_sync: true). - Write session summary:
writeSessionSummary(projectPath, sessionSlug, { session, lastActive, skill: "harness-autopilot", phase, status, spec, plan, keyContext, nextStep }). - More phases: "Phase {N} complete. Next: {N+1}: {name} ({complexity}). Continue? (yes / stop)."
yes→ incrementcurrentPhase, resetretryBudget, → ASSESS.stop→ save and exit. - No more phases: → FINAL_REVIEW.
FINAL_REVIEW
- Set
currentState: "FINAL_REVIEW",finalReview.status: "in_progress". - Gather per-phase findings from
{sessionDir}/phase-{N}-review.jsonfiles. - Dispatch harness-code-reviewer:
subagent_type: "harness-code-reviewer" prompt: "Final cross-phase review. Diff: git diff {startingCommit}..HEAD. Session: {sessionSlug}. Prior findings: {collected}. Focus on cross-phase coherence: naming, duplicated utilities, architectural drift. Report findings (critical / important / suggestion)." - No blocking: store in
finalReview.findings, set"passed"→ DONE. - Blocking: ask "fix / override / stop."
fix: incrementfinalReview.retryCount(max 3). Dispatch harness-task-executor: "Fix these blocking findings: {findings with file, line, title}. Session: {sessionSlug}. Commit each fix atomically." Runharness validate. Re-run FINAL_REVIEW from step 1. If retryCount > 3: stop, record in.harness/failures.md.override: record rationale indecisions[]. Set"overridden"→ DONE.stop: save state and exit (resumable).
DONE
- Present: total phases, tasks, retries, time,
finalReview.status+ findings count, any overridden findings. - Ask "Create a PR? (yes / no)."
- Write final handoff to
{sessionDir}/handoff.json. Append learnings to.harness/learnings.md. CallpromoteSessionLearnings(projectPath, sessionSlug). If learnings count > 30, suggestharness learnings prune. - If
docs/roadmap.mdexists: callmanage_roadmap updateto set feature done. Skip if not found. - Write final
writeSessionSummary(). SetcurrentState: "DONE"in autopilot-state.json.
Process
- INIT — Resolve spec, derive session slug, check for existing state, parse phases.
- ASSESS — Route by complexity: low/medium auto-plans, high pauses for interactive planning.
- PLAN → APPROVE — Dispatch harness-planner, check approval signals, auto-approve or pause.
- EXECUTE — Dispatch harness-task-executor with plan path, handle checkpoints and retries (max 3).
- VERIFY — Dispatch harness-verifier, confirm code correctness and wiring.
- INTEGRATE — Resolve integration tier, dispatch harness-integration, verify system wiring, knowledge materialization, and documentation per tier.
- REVIEW — Dispatch harness-code-reviewer, fix blocking findings.
- PHASE_COMPLETE — Summarize (including integration report), sync roadmap, loop to ASSESS for next phase or proceed to FINAL_REVIEW.
- FINAL_REVIEW → DONE — Cross-phase review, offer PR creation, write final handoff.
Harness Integration
- State:
{sessionDir}/autopilot-state.json(orchestration) +{sessionDir}/state.json(task-level, written by harness-execution). - Handoff:
{sessionDir}/handoff.json— written by each delegated skill, read by next. Autopilot writes final handoff at DONE. - Checkpoint commits:
commitAtCheckpoint()after passing checkpoints. Recovery commits use[autopilot][recovery]prefix. - Scratchpad: cleared at PHASE_COMPLETE via
clearScratchpad(). Skipped atrigorLevel: "fast".
Gates
- No reimplementing delegated skills. Writing planning/execution/verification/review/integration logic → STOP. Delegate via
subagent_type. - No executing without plan approval. Every plan passes APPROVE_PLAN. No exceptions.
- No skipping VERIFY, INTEGRATE, or REVIEW. Human can override findings; steps cannot be skipped. INTEGRATE may be skipped only via explicit "skip" choice with decision recorded in
decisions[]. - No infinite retries. EXECUTE budget: 3 attempts. FINAL_REVIEW: 3 cycles. If exhausted, stop and surface.
- No modifying state files manually. If corrupted, start fresh.
Escalation
- Spec missing Implementation Order: Cannot identify phases. Ask user to add phase annotations or provide roadmap.
- Delegated skill fails to produce output: Check
{sessionDir}/handoff.json. Report and ask: retry or stop. - User wants to reorder phases mid-run: Update
phases[](mark skipped, adjustcurrentPhase). Do not re-run completed phases. - Context limits approaching: Persist state immediately. "State saved. Re-invoke
/harness:autopilotto continue." - 2 consecutive phase failures: Suggest reviewing spec for systemic issues.
Rationalizations to Reject
| Rationalization | Reality |
|---|---|
| "Low complexity means I can skip APPROVE_PLAN" | Low complexity means auto-approval only when no signals fire. Signals override complexity. |
| "I can inline planning logic instead of dispatching to harness-planner" | Iron Law. Autopilot delegates, never reimplements. No exceptions. |
| "Retry budget exhausted but one more approach might work" | 3-attempt budget prevents compounding failure. Exceeding it without human input is unrecoverable. |
| "Keeping research in conversation is faster than scratchpad" | Scratchpad gated by rigor level. At standard/thorough, >500 words must go to scratchpad. |
| "Plan auto-approved, so I can skip recording the decision" | Every approval—auto or manual—is recorded in decisions[]. That array is the audit trail. |
Success Criteria
- All phases in the spec are executed in order with plan → execute → verify → integrate → review per phase
- Every plan approval is recorded in
decisions[](auto or manual) - Retry budget (3 attempts) is enforced — exhausted retries surface to user, never silently continue
- FINAL_REVIEW runs on
startingCommit..HEADdiff and catches cross-phase coherence issues - State is persisted to
autopilot-state.jsonafter every state transition — re-invocation resumes correctly harness validatepasses after every phase
Examples
Invocation: /harness:autopilot docs/changes/security-scanner/proposal.md
INIT: 3 phases found: Phase 1: Core Scanner (low), Phase 2: Rule Engine (high), Phase 3: CLI Integration (low).
Phase 1 — ASSESS → PLAN: harness-planner dispatched. Returns plan: docs/changes/security-scanner/plans/2026-03-19-core-scanner-plan.md (8 tasks).
Phase 1 — APPROVE_PLAN (auto): All signals false. "Auto-approved Phase 1: Core Scanner | auto | low | no concerns | 8 tasks."
Phase 1 — EXECUTE: harness-task-executor dispatched with plan path + session. 8 tasks complete. 2 checkpoint commits.
Phase 1 — VERIFY: harness-verifier dispatched. Pass. REVIEW: harness-code-reviewer. 0 blocking, 2 notes.
Phase 1 — PHASE_COMPLETE: "Phase 1 complete. Next: Phase 2: Rule Engine (high). Continue? → yes"
Phase 2 — ASSESS: High complexity. "Run /harness:planning interactively, then re-invoke." [User plans interactively. Re-invokes.]
INIT (resume): "Resuming from PLAN, phase 2: Rule Engine. Found plan: docs/changes/security-scanner/plans/2026-03-19-rule-engine-plan.md"
Phase 2 — APPROVE_PLAN (paused): Complexity: high triggered. "Approve? → yes" EXECUTE → VERIFY → REVIEW → PHASE_COMPLETE. 14 tasks, 1 retry.
Phase 3: auto-plans and executes. FINAL_REVIEW: harness-code-reviewer on startingCommit..HEAD. 0 blocking, 1 warning. Passed.
DONE: 3 phases, 30 tasks, 1 retry. "Create PR? → yes"
Retry exhaustion (during any phase):
Task 4 fails → Retry 1/3: obvious fix applied, still fails
→ Retry 2/3: expanded context (related files + learnings), still fails
→ Retry 3/3: full context gather (test output + imports + plan), still fails
Budget exhausted. Recorded in .harness/failures.md.
Fix manually and continue / revise plan / stop?