name: harshjudge description: AI-native E2E testing orchestration for Claude Code. Use when creating, running, or managing end-to-end test scenarios with visual evidence capture. Activates for tasks involving E2E tests, browser automation testing, test scenario creation, test execution with screenshots, or checking test status.

HarshJudge E2E Testing

AI-native E2E testing with MCP tools and visual evidence capture.

Core Principles

Evidence First: Screenshot before and after every action
Fail Fast: Stop on error, report with context
Complete Runs: Always call completeRun, even on failure
Step Isolation: Each step executes in its own spawned agent for token efficiency
Knowledge Accumulation: Learnings go to prd.md, not scenarios

Step-Based Execution

HarshJudge uses a step-based agent pattern for token-efficient test execution:

Main Agent                    Step Agents (spawned per step)
    │
    ├─ startRun(scenarioSlug)
    │      ↓
    │  Returns: runId, steps[]
    │
    ├─► Spawn Agent: Step 01 ──────────────────────► Execute actions
    │      │                                              │
    │      │ ◄─────────────────────────────────── Return: { status, evidencePaths }
    │      │
    │   completeStep(runId, "01", status)
    │      │
    ├─► Spawn Agent: Step 02 ──────────────────────► Execute actions
    │      │                                              │
    │      │ ◄─────────────────────────────────── Return: { status, evidencePaths }
    │      │
    │   completeStep(runId, "02", status)
    │      │
    │   ... (repeat for each step)
    │
    └─ completeRun(runId, finalStatus)

Benefits:

Each step agent has isolated context (no token accumulation)
Large outputs (screenshots, logs) saved to files, not returned
Main agent only receives concise summaries
Automatic token optimization without manual management

Workflows

Intent	Reference	Key Tools
Initialize project	references/setup.md	`initProject`
Create scenario	references/create.md	`createScenario`
Run scenario	references/run.md	`startRun`, `completeStep`, `completeRun`
Fix failed test	references/iterate.md	`getStatus`, `createScenario`
Check status	references/status.md	`getStatus`

Project Structure

.harshJudge/
  config.yaml              # Project configuration
  prd.md                   # Product requirements (from assets/prd.md template)
  scenarios/{slug}/
    meta.yaml              # Scenario definition + run statistics
    steps/                 # Individual step files
      01-step-slug.md      # Step 01 details
      02-step-slug.md      # Step 02 details
      ...
    runs/{runId}/          # Run history
      result.json          # Run result with per-step data
      step-01/evidence/    # Step 01 evidence
      step-02/evidence/    # Step 02 evidence
      ...
  snapshots/               # Inspection tool outputs (token-saving pattern)

Quick Reference

HarshJudge MCP Tools

Tool	Purpose
`initProject`	Initialize project (spawns dashboard)
`createScenario`	Create/update scenario with step files
`toggleStar`	Toggle/set scenario starred status
`startRun`	Start test run, returns step list
`recordEvidence`	Capture evidence for a step
`completeStep`	Complete a step, get next step ID
`completeRun`	Finalize run with status
`getStatus`	Check project or scenario status
`openDashboard` / `closeDashboard`	Manage dashboard server

Playwright MCP Tools

Tool	Purpose
`browser_navigate`	Navigate to URL
`browser_snapshot`	Get accessibility tree (use before click/type)
`browser_click`	Click element using ref
`browser_type`	Type into input using ref
`browser_take_screenshot`	Capture screenshot for evidence
`browser_console_messages`	Get console logs
`browser_network_requests`	Get network activity
`browser_wait_for`	Wait for text/condition

Step Agent Prompt Template

When spawning an agent for each step:

Execute step {stepId} of scenario {scenarioSlug}:

## Step Content
{content from steps/{stepId}-{slug}.md}

## Project Context
Base URL: {from config.yaml}
Auth: {from prd.md if needed}

## Previous Step
Status: {pass|fail|first step}

## Your Task
1. Execute the actions using Playwright MCP tools
2. Use browser_snapshot before clicking to get element refs
3. Capture before/after screenshots using browser_take_screenshot
4. Record evidence using recordEvidence with step={stepNumber}

Return ONLY a JSON object:
{
  "status": "pass" | "fail",
  "evidencePaths": ["path1.png", "path2.png"],
  "error": null | "error message"
}

DO NOT return full evidence content. DO NOT explain your work.

Error Handling

On ANY error:

STOP - Do not proceed
Report - Tool, params, error, resolution
Check prd.md - Is this a known pattern?
Do NOT retry - Unless user instructs

ナビゲーション

Skillsとは？

リンク

harshjudge