name: plan-forge description: Use when a task needs an implementation plan that is iteratively created and stress-tested through review-and-revise cycles before implementation begins — catches blind spots, incorrect codebase assumptions, unnecessary complexity, and performance pitfalls while changes are still cheap

Plan Forge

Iteratively creates AND refines an implementation plan through review-and-revise cycles. A metallurgy metaphor: the plan is heated (reviewed), hammered (revised), and quenched (finalized) until it holds up under stress.

Unlike /plan-review (one-shot post-hoc review of an existing plan), plan-forge creates the plan from scratch and runs 1-3 rounds of dual review, consolidation, and revision before presenting the final artifact.

When to Use

Before implementing a multi-step feature that touches critical code paths
When the task involves non-obvious architectural decisions
When you want a plan that has been stress-tested before writing any code
When blind spots in planning are more expensive than the review overhead

When NOT to Use

Single-file, few-line changes (just do them)
The plan already exists and just needs review (use /plan-review)
You need to explore multiple competing designs first (use /design-tournament, then feed the winner into /plan-forge)
Pure research tasks (use /deep-research or /deeper-research)

Invocation

/plan-forge <task description>
/plan-forge --rounds=1 <task>
/plan-forge --focus=concurrency <task>
/plan-forge --plan-only <task>
/plan-forge --review-only <path-to-existing-plan>

Architecture

Phase 0: Plan Creation (orchestrator explores codebase, writes initial plan)
    |
    v
+---> Phase 1: Dual Review (2 parallel general-purpose agents, fresh context)
|         |
|     Phase 2: Consolidation (1 general-purpose agent merges findings)
|         |
|     Phase 3: Revision (orchestrator revises plan inline)
|         |
|     Decision: continue? ----yes (RETHINK/REVISE items remain, round < max)---+
|         |                                                                      |
|         no (only WATCH items, or max round reached)                            |
|         v                                                                      |
+---- Phase 4: Final Presentation <---------------------------------------------+

Agents per round: 2 reviewers + 1 consolidator = 3 Total agents across 1-3 rounds: 3-9

Phase 0 --- Plan Creation (Orchestrator, Inline)

The orchestrator (you, not a sub-agent) creates the initial plan.

Steps

Parse task --- identify core objective, constraints, domain.
Explore codebase --- use Glob, Grep, Read to find relevant files, patterns, existing utilities. Check crates/gossip-stdx/src/ and neighboring modules for duplication (per CLAUDE.md rules).
Write initial plan to ~/.claude/plans/{YYYY-MM-DD}-{feature-slug}-v1.md.

Versioned Plan Files

Each revision writes a NEW file with an incremented version suffix. Prior versions are kept for reference and diffing.

~/.claude/plans/2026-02-23-retry-logic-v1.md   <- Phase 0 output (initial)
~/.claude/plans/2026-02-23-retry-logic-v2.md   <- After Round 1 revision
~/.claude/plans/2026-02-23-retry-logic-v3.md   <- After Round 2 revision (final)

Plan File Template

# {Plan Title}

| Field            | Value                        |
|------------------|------------------------------|
| Date             | {YYYY-MM-DD}                 |
| Status           | Draft / In Review / Final    |
| Version          | v{N}                         |
| Rounds completed | {N}                          |
| Task             | {one-line summary}           |

## Problem Statement

{What problem does this solve and why does it matter?}

## Codebase Context

{Discovered files, patterns, abstractions relevant to the task.
Include file paths and brief descriptions.}

## Steps

### Step {N}: {Title}

- **What**: {concrete description}
- **Why**: {justification}
- **Files**: {exact paths to create or modify}
- **Tests**: {what to test and how}
- **Acceptance criteria**: {how to verify correctness}

## Testing Strategy

{Overall testing approach --- unit, property-based, integration, etc.}

## Revision Log

{Populated during review rounds. Cumulative across versions.}

| Round | Finding ID | Action Taken |
|-------|-----------|--------------|

## Open Items

{WATCH items and unresolved concerns.}

Flags

--review-only <path>: Skip Phase 0. Load the plan at <path> and jump directly to Phase 1.
--plan-only: Stop after Phase 0. Write the plan and present it without running any review rounds.

Phase 1 --- Dual Review (2 Parallel Agents)

Launch 2 agents in a single message using the Task tool with subagent_type=general-purpose. Each covers all 4 review lenses but with different primary emphasis to reduce blind-spot overlap.

Agent	Label	Primary Emphasis (40%)	Secondary (20% each)
Alpha	Forge Inspector	Correctness & Soundness	Footguns, Simplification, Performance
Beta	Forge Optimizer	Simplification & Pragmatism	Performance, Correctness, Footguns

Common Preamble (included in both agents' prompts)

You are {AGENT_LABEL}, a plan reviewer in Round {ROUND} of the Plan Forge
process. You review the plan below through ALL four lenses but emphasize
{PRIMARY_EMPHASIS} (allocate ~40% of your attention there, ~20% each to the
other three).

## Plan Under Review

{PLAN}

## Codebase Context

{CONTEXT}

{PRIOR_ROUND_SECTION}

## Four Review Lenses

### Correctness & Soundness
- Does the plan actually solve the stated problem?
- Are assumptions about existing code accurate? (check the codebase)
- Do referenced types, traits, APIs exist with described signatures?
- Are ordering dependencies correct?
- Do state transitions and invariants hold under all cases?

### Footguns & Failure Modes
- Race conditions, TOCTOU bugs, atomicity gaps
- Edge cases not addressed (empty inputs, overflow, boundaries)
- Error propagation paths that silently swallow failures
- Partial failure scenarios (what if step 3 of 5 fails?)
- Implicit assumptions that break under different configurations

### Simplification
- YAGNI: does the plan build things not yet needed?
- Does the codebase already have utilities the plan reinvents? (search with
  Glob/Grep, especially crates/gossip-stdx/src/)
- Could fewer files, types, or steps achieve the same result?
- Are there unnecessary abstraction layers or indirection?
- Could an existing pattern be extended instead of building new?

### Performance & Scalability
- Hot path allocations in loops (Vec, String, Box)
- Lock contention or oversized critical sections
- O(n^2) or worse algorithms hidden in the approach
- Blocking operations in async contexts
- Unbounded growth (queues, buffers, caches without limits)

## Rules

- Explore the codebase (Glob, Grep, Read) to ground findings in reality.
  The most valuable findings come from gaps between plan assumptions and
  codebase reality.
- Only report findings that REQUIRE action. No nits, no style suggestions.
- Be concrete: cite the specific plan step, section, or quoted text.
- For each finding, state the PROBLEM and the RECOMMENDED CHANGE.
- Rate each finding:
  - Impact (1-10): How much does this matter if unaddressed?
  - Confidence (0-100%): How sure are you this is a real issue?

## Output Format

Return a markdown document starting with:
`# {AGENT_LABEL} Review --- Round {ROUND}`

For each finding:

### {FINDING_ID}: {title}

- **Plan step**: {which step or section}
- **Lens**: {Correctness | Footguns | Simplification | Performance}
- **Problem**: {what is wrong or missing}
- **Evidence**: {codebase evidence --- file paths, existing code, design docs}
- **Recommended change**: {specific edit to the plan}
- **Impact**: N/10
- **Confidence**: N%

End with: "Total findings: N" (0 is valid --- do not invent issues).

Finding ID Scheme

R{round}.A{agent}.F{n}

Agent identifiers: a for Alpha, b for Beta.
Example: R1.Aa.F3 = Round 1, Alpha, Finding 3.

Agent-Specific Sections

Alpha (Forge Inspector) --- replace {AGENT_LABEL} with Forge Inspector, {PRIMARY_EMPHASIS} with Correctness & Soundness:

Your primary emphasis is CORRECTNESS & SOUNDNESS (40%). Prioritize verifying
that the plan actually solves the problem, that referenced code exists as
described, and that invariants hold. Give secondary attention (~20% each) to
footguns, simplification, and performance.

Use finding IDs: R{ROUND}.Aa.F1, R{ROUND}.Aa.F2, ...

Beta (Forge Optimizer) --- replace {AGENT_LABEL} with Forge Optimizer, {PRIMARY_EMPHASIS} with Simplification & Pragmatism:

Your primary emphasis is SIMPLIFICATION & PRAGMATISM (40%). Prioritize finding
YAGNI violations, existing utilities the plan reinvents, and opportunities to
achieve the same result with less complexity. Give secondary attention (~20%
each) to performance, correctness, and footguns.

Use finding IDs: R{ROUND}.Ab.F1, R{ROUND}.Ab.F2, ...

Prior Round Section (Rounds 2+)

For rounds 2+, append this section to each agent's prompt:

## Prior Round Findings

The following findings were raised in prior rounds. Check whether the revised
plan adequately addresses them. If a prior finding is STILL present, re-raise
it with a note that it was not resolved.

{PRIOR_CONSOLIDATED_FINDINGS}

Phase 2 --- Consolidation (1 Agent)

After both reviewers complete, launch 1 consolidator agent using the Task tool with subagent_type=general-purpose.

Consolidator Prompt

You are the Forge Consolidator for Round {ROUND}. Two independent reviewers
have examined the same implementation plan. Your job is to merge their findings
into one focused, actionable report and issue a verdict.

## Original Plan

{PLAN}

## Reviewer Reports

{ALPHA_REPORT}

---

{BETA_REPORT}

{PRIOR_TRACKING_SECTION}

## Your Task

### 1. Deduplicate

Group findings that flag the same underlying issue from different angles into
single consolidated findings. Note which reviewers flagged each.

### 2. Overload Check

Count unique findings after deduplication. If there are MORE THAN 10 unique
findings, or MORE THAN 3 that would be classified as RETHINK, emit ONLY:

---

**This plan needs fundamental rework.** The review found {N} issues across
{areas}. Rather than patching individually, redesign the approach. The top 3
structural issues to address first:

1. {highest-impact finding}
2. {second highest}
3. {third highest}

---

Then STOP. Do not produce the full report.

### 3. Score Each Finding (if overload check passes)

For every unique finding, assign:

- **Impact** (1-10):
  - 9-10: Fundamental flaw --- approach won't work
  - 7-8: Significant gap --- plan needs edits before implementation
  - 5-6: Real concern --- implementation must handle explicitly
  - 3-4: Minor --- below threshold, discard

- **Confidence** (0-100%):
  - 90-100: Clear problem with codebase evidence
  - 70-89: Very likely, strong reasoning
  - 50-69: Plausible, may need investigation
  - Below 50: Speculative --- discard

Discard findings with impact < 4 or confidence < 50%.

### 4. Classify

Assign each surviving finding exactly one category:

- **RETHINK** (impact >= 8, confidence >= 70): Fundamental approach change
  needed. Non-negotiable.
- **REVISE** (impact >= 6, confidence >= 60): Specific plan edits required.
- **WATCH** (impact >= 4, confidence >= 50): Plan is sound but implementation
  must handle this explicitly.

### 5. Issue Verdict

Based on surviving findings:

- **FORGE AGAIN**: Any RETHINK items exist. Plan MUST be revised and
  re-reviewed.
- **TEMPER**: No RETHINK items, but REVISE items exist. Plan should be revised
  and re-reviewed if round < max.
- **QUENCH**: Only WATCH items (or no findings). Plan is ready.

### 6. Output Format

```markdown
## Forge Consolidation --- Round {ROUND}

**Verdict**: {FORGE AGAIN | TEMPER | QUENCH}
**Unique findings**: {N} (after dedup and filtering)

### RETHINK

| # | Finding ID | Title | Plan Step | Impact | Confidence | Reviewers |
|---|-----------|-------|-----------|--------|------------|-----------|

**Details:**

#### {R{ROUND}.C.F1}: {title}
- **Problem**: {description}
- **Evidence**: {codebase evidence}
- **Recommended change**: {specific plan revision}
- **Original IDs**: {which reviewer finding IDs map here}

### REVISE

{same format}

### WATCH

{same format}

### Prior Finding Tracking

| Prior Finding ID | Status | Notes |
|-----------------|--------|-------|
| R1.C.F2         | RESOLVED | Plan step 3 now addresses this |
| R1.C.F5         | PARTIALLY RESOLVED | Step added but edge case missing |
| R1.C.F7         | UNRESOLVED | Still not addressed |

Consolidated Finding IDs

Use: R{ROUND}.C.F{n} (C = consolidated).

Rules

Do NOT add your own findings. You are a consolidator, not a reviewer.
If a reviewer's finding seems speculative, lower its confidence. If it drops below 50%, discard it.
Preserve plan step references and codebase citations from reviewer reports.


### Prior Tracking Section (Rounds 2+)

For rounds 2+, append this to the consolidator prompt:

Prior Round Consolidated Findings

Track whether each prior finding has been addressed in the revised plan:

{PRIOR_CONSOLIDATED_FINDINGS_WITH_STATUS}

For each prior finding, assign: RESOLVED / PARTIALLY RESOLVED / UNRESOLVED. Include this tracking in your output.


---

## Phase 3 --- Revision (Orchestrator, Inline)

The orchestrator (you, not a sub-agent) revises the plan based on consolidated
findings and writes a **new versioned file**.

### Revision Rules

1. **RETHINK findings**: Make fundamental changes. These are non-negotiable.
2. **REVISE findings**: Make the specific edits recommended.
3. **WATCH findings**: Add to Open Items section. Do NOT restructure the plan
   for WATCH items.
4. **Update Revision Log**: Map each finding ID to the action taken.
5. **Increment version** in header and filename (`-v1.md` -> `-v2.md`).
6. **Verify internal consistency**: After edits, re-read the plan to ensure
   steps still flow logically and no contradictions were introduced.
7. **Keep prior version file** --- do not delete or overwrite it.

---

## Round Decision

After revision, decide whether to loop back to Phase 1:

| Verdict      | Round < max | Round = max |
|-------------|-------------|-------------|
| FORGE AGAIN | -> Phase 1  | -> Phase 4 (forced stop, flag unresolved RETHINK) |
| TEMPER      | -> Phase 1  | -> Phase 4  |
| QUENCH      | -> Phase 4  | -> Phase 4  |

Default max rounds: 3. Override with `--rounds=N` (1-3).

---

## Phase 4 --- Final Presentation

1. Set plan status to `Final` in the latest version file.
2. Collect all WATCH items into Open Items section.
3. If forced stop with unresolved RETHINK items: add a prominent warning at the
   top of the plan file and call it out when presenting to the user.
4. Present a round summary table to the user.
5. Append all review reports as collapsed `<details>` sections at the end of
   the plan file.
6. Report version history with file paths.

### Final Presentation Format

```markdown
## Plan Forge Complete

**Plan**: {title}
**Rounds**: {N}
**Final verdict**: {QUENCH | forced stop}
**Version history**:
- `{path}-v1.md` (initial)
- `{path}-v2.md` (round 1 revision)
- `{path}-v3.md` (final)

### Round Summary

| Round | Verdict | RETHINK | REVISE | WATCH | Total |
|-------|---------|---------|--------|-------|-------|
| 1     | FORGE AGAIN | 1   | 3      | 2     | 6     |
| 2     | QUENCH      | 0   | 0      | 1     | 1     |

### Open Items (WATCH)

{collected WATCH items from all rounds}

### Review Reports (collapsed)

<details><summary>Round 1 --- Forge Inspector</summary>
{full report}
</details>
<details><summary>Round 1 --- Forge Optimizer</summary>
{full report}
</details>
<details><summary>Round 1 --- Consolidation</summary>
{full report}
</details>
<details><summary>Round 2 --- Forge Inspector</summary>
{full report}
</details>
...

Configuration

Flag	Effect
`--rounds=N`	Override max rounds (1-3). Default: 3.
`--focus=<domain>`	Adds domain-specific pitfall context to all agent prompts.
`--plan-only`	Create plan (Phase 0), skip all reviews.
`--review-only <path>`	Skip plan creation, review existing plan at `<path>`.

Focus Domain Pitfalls

When --focus=<domain> is specified, append this paragraph to every agent prompt (Phase 1 and Phase 2):

Additional context: This plan operates in the {DOMAIN} domain. Pay particular
attention to {DOMAIN}-specific concerns.

Domain-specific pitfall lists to include:

concurrency: data races, deadlock/livelock, lock ordering, priority inversion, false sharing, memory ordering (Acquire/Release vs SeqCst), Send/Sync bounds, async cancellation safety.

distributed: partial failure, network partitions, clock skew, exactly-once semantics, idempotency, consensus protocol correctness, split-brain, message ordering, retry storms.

security: input validation, injection (SQL/command/XSS), authentication bypass, authorization escalation, timing side channels, secret management, cryptographic misuse, TOCTOU in security checks.

performance: allocation hot paths, cache locality, branch prediction, SIMD opportunities, async runtime blocking, lock contention, false sharing, memory layout (SoA vs AoS), tail latency.

unsafe: soundness holes, aliasing violations, uninitialized memory, lifetime transmutation, Send/Sync impl correctness, drop order, panic safety, provenance.

Anti-Patterns

Mistake	Why it fails	Do this instead
Skipping Phase 0 codebase exploration	Plan makes wrong assumptions about existing code	Always Glob/Grep/Read before writing the plan
Launching reviewers sequentially	Wastes time and allows anchoring	Always launch both in a single message
Orchestrator adding own findings during consolidation	Conflates roles, biases revision	Only the reviewer agents produce findings
Revising the plan in-place (overwriting prior version)	Loses diff history	Always write a new `-v{N+1}.md` file
Running 3 rounds on a trivial plan	Overhead exceeds value	Use `--rounds=1` for simple plans
Treating WATCH items as REVISE	Over-engineering the plan	WATCH goes to Open Items, not plan restructure
Ignoring the overload threshold	Patching 15 findings creates a Frankenstein plan	If overload triggers, rethink the approach wholesale

Tips

Pair with /design-tournament: Run a tournament first to pick the approach, then forge the implementation plan for the winning design.
Pair with /plan-review: For a final one-shot validation of the forged plan with 4 specialist lenses instead of 2 generalist reviewers.
For plans with --focus=unsafe: Consider following up with /unsafe-review after implementation.
Diff between versions: Use diff ~/.claude/plans/*-v1.md ~/.claude/plans/*-v2.md to see exactly how the plan evolved through review rounds.

ナビゲーション

Skillsとは？

リンク

plan-forge