Evaluation harness for testing agent and skill quality through structured benchmarks, regression tests, and quality scoring.
Skills(SKILL.md)は、AIエージェント(Claude Code、Cursor、Codexなど)に特定の能力を追加するための設定ファイルです。
詳しく見る →Evaluation harness for testing agent and skill quality through structured benchmarks, regression tests, and quality scoring.
PM2 process management, backend/frontend cascade execution, parallel worktree builds, and cross-service integration testing.
Research-first development methodology that investigates existing solutions, brainstorms alternatives, and evaluates trade-offs before any implementation begins.
Red-Green-Refactor TDD methodology with mandatory failing tests, minimal implementation, quality refactoring, and 80% coverage gating.
Author TOML-based Formula workflow templates that become Protomolecules and active Molecules in Gas Town's durable workflow system.
Git commit patterns, formats, and conventions for GSD methodology. Provides atomic commits per task, structured commit messages, planning file commits, branch management, and milestone tag operations.
Central utility skill for GSD operations. Provides config parsing, slug generation, timestamps, path operations, and orchestrates calls to other specialized skills. Acts as the unified entry point that the original gsd-tools.cjs provided via its lib/ modules (commands, config, core, init).
Resolve model profile (quality/balanced/budget) at orchestration start and map agents to specific models. Enables cost/quality tradeoffs by selecting appropriate AI models for each agent role.
Roadmap parsing, analysis, and mutation operations for ROADMAP.md. Handles phase and milestone lifecycle including add, insert (decimal), remove, complete, and requirements coverage analysis.
STATE.md reading, writing, and field-level updates. Provides cross-session state persistence via .planning/STATE.md with structured fields for current task, completed phases, blockers, decisions, and quick tasks.
Template loading, variable filling, and scaffolding for all GSD artifacts. Manages 22+ templates covering every document type in the GSD system, from PROJECT.md to milestone archives.
Architect code review with DRY, YAGNI, abstraction, and test coverage principle enforcement
Urgent issue classification, root cause analysis, and fast-path routing for production hotfixes
Capture, validate, query, and sync architectural patterns and design decisions in the knowledge graph
Technical debt management including branch cleanup, doc verification, TODO scanning, and dependency auditing
Interactive PM interview with expertise-adaptive questioning for requirements elicitation
Convert requirements into structured technical specifications with architecture decisions
Break technical specifications into small, implementable stories with dependency ordering
Automated test validation, coverage checking, and quality metrics with aggressive defaults
Fresh adversarial code review with binary PASS/FAIL verdicts, evidence citations, and anchoring bias prevention via fresh reviewer spawning.
Parallel design review by 6 specialist agents (PM, Architect, Designer, Security Design, UX, CTO) with mandatory unanimous approval.
Coordinate external AI tool integration (OpenAI Codex, Google Gemini) for cross-model adversarial review and delegated implementation.
Execute work units through the rigorous 4-phase Metaswarm cycle (Implement -> Validate -> Adversarial Review -> Commit) with independent quality gate enforcement.
Adversarial plan review by 3 independent reviewers (Feasibility, Completeness, Scope & Alignment) before presenting to user.
Monitor PR lifecycle from creation through merge including CI monitoring, review comment handling, thread resolution, and merge readiness verification.
Decompose implementation plans into discrete work units with enumerated DoD items, file scope declarations, dependency mapping, and human checkpoint flags.
Bug condition/postcondition formalization as testable Behavior Contracts. Defines invariants that must be preserved across fixes.
Convention discovery and rule generation from codebase analysis. Scans project structure, builds search indexes, identifies patterns, and generates enforceable rules.
State capture and restore across context window compactions. Monitors usage thresholds and serializes quality, task, and spec state for seamless continuation.
Observation capture and retrieval across sessions. Stores decisions, discoveries, and bugfix patterns. Searchable via tags and relevance scoring.
Language-specific auto-lint/format/typecheck pipeline. Supports Python (ruff+pyright), TypeScript (prettier+eslint+tsc), Go (gofmt+golangci-lint). Auto-fix and convergence loops.
Specification creation and management for the Pilot Shell methodology. Covers semantic search, clarifying questions, structured spec generation, and iterative refinement.
Strict RED->GREEN->REFACTOR test-driven development with enforcement. Never write production code before a failing test. Atomic commits per TDD cycle.
Log all errors with full context, detect patterns, and suggest approach mutations to avoid repeated failures.
Create a structured task_plan.md with phases, goals, and checkbox tracking for persistent planning.
Maintain progress.md with session logs, test results, error records, and phase status indicators.
Clarify vague requirements through exploratory questioning and option generation before committing to research or implementation.
Structured code quality assessment with Conventional Comments format, scaled review depth, and soft-gating verdicts preserving user autonomy.
Final completion discipline including summary generation, plan document updates, and confirmation that all success criteria from the original plan are satisfied.
Disciplined execution of approved plans with step-by-step verification, phase checkpoints, failure investigation, and mandatory code/security reviews.
Transform research findings into actionable implementation plans with stakes-based rigor, test-first strategy, and granular task decomposition.
Security vulnerability assessment identifying OWASP risks, injection vectors, authentication issues, and data exposure with severity classification.
Structured debugging methodology using hypothesis-driven investigation, log analysis, and bisection to isolate and resolve defects.
Test-first development practice where test specifications are written before production code, integrated into plan tasks as mandatory first sub-steps.
Verification-before-completion discipline ensuring all success criteria are met, tests pass, and reviews complete before declaring work done.
WASM-based instant code transforms for simple tasks, achieving 352x speedup over LLM inference with zero cost.
Hierarchical coordination and drift detection with frequent checkpoints, shared memory coherence validation, role specialization enforcement, and short task cycles.
Multi-agent swarm formation and coordinated execution with topology-aware agent deployment, consensus protocols, and anti-drift enforcement.
HNSW vector search for pattern similarity retrieval and knowledge graph maintenance with PageRank scoring, community detection, and 3-tier memory management.
Establish project governing principles including dev guidelines, code quality standards, testing policies, UX requirements, performance benchmarks, and security constraints.