name: mine-best-practices description: Extract best practices from PR review comments to build a curated library for code review automation license: MIT argument-hint: "--since YYYY-MM-DD [--until YYYY-MM-DD] [--scope NAME]" metadata: author: Valon Technologies version: "1.0"
Mine Best Practices
Extract insights from PR review threads, validate against codebase, and consolidate into the best practices library.
Your Role as Orchestrator
You are the orchestrator for this multi-stage pipeline. Your responsibilities:
- Execute scripts - Run the Python scripts that prepare batches and aggregate results
- Launch subagents - Create Task() calls to dispatch specialized subagents for extraction, validation, and synthesis. Max 10 concurrent — if more batches exist, wait for a wave to complete before launching the next.
- Validate outputs - After each phase, review subagent outputs for quality, format correctness, and issues
- Stop on anomalies - If you detect problems (malformed output, unexpected results, low yield), stop and alert the user. Do not attempt to fix issues on-the-fly.
Key principle: Validate each stage's output before proceeding. Only interrupt the user when something needs human judgment.
When to Use This Skill
Use when:
- Building/updating the best practices library from recent PRs
- Mining a date range of PR reviews for patterns
- Seeding the library from historical review threads
Don't use for:
- Reviewing code against the library of current practices
- General PR reviews
Usage
/mine-best-practices --since 2025-01-01
/mine-best-practices --since 2025-06-01 --until 2025-07-01 --scope backend
All date ranges refer to PR merge date (inclusive on both ends).
Advanced
For debugging and manual intervention:
/mine-best-practices resume validate --identifier web_2025-01-29
/mine-best-practices status
/mine-best-practices pending
/mine-best-practices for-topic error_handling
--batch-size and --id-prefix are tuning parameters rarely needed in normal operation.
Data Refresh
Before mining, ensure threads are up to date:
python3 scripts/mine.py refresh # Incremental (new PRs only)
python3 scripts/mine.py refresh --since 2025-01-01 # From specific merge date
python3 scripts/mine.py refresh --since 2026-01-09 --until 2026-01-26 # Specific range
python3 scripts/mine.py refresh --full # Full re-extraction
Requires gh CLI authenticated with repo access. Safe to re-fetch overlapping ranges (deduplicates by thread_id).
Execution Workflow
NOTE: All commands run from the skill directory (where this SKILL.md lives).
Step 1: Start Extraction
python3 scripts/mine.py extract --since 2025-01-01 --scope backend
Outputs extraction Task prompts for each batch.
Step 2: Launch Extraction Subagents
Launch the Task prompts from Step 1 in parallel using the Task tool.
Output: tmp/mining_{identifier}/extraction/batch_{n}.yaml
After subagents complete, validate:
- Check each batch output file exists
- Verify YAML format is correct (insights list, skipped entries)
- Review yield rate (typically 30-40% extracted, 60-70% skipped)
- Spot-check 2-3 insight content samples for quality
- Stop and alert user if: yield is unusually low/high, format errors, or quality issues
Step 3: Aggregate Extraction
python3 scripts/aggregate_extraction.py {identifier}
Merges results into insights.yaml and outputs validation Task prompts.
After aggregation, validate:
- Verify insights.yaml was updated with new insights
- Check insight count matches expected (extracted - duplicates)
- Review a few insight content samples
- Stop and alert user if: counts don't match, format issues, or quality concerns
Step 4: Launch Validation Subagents
Launch validation Task prompts in parallel using the Task tool.
Output: tmp/mining_{identifier}/validation/batch_{n}.yaml
After subagents complete, validate:
- Check each batch output file exists
- Verify YAML format is correct (rejections list)
- Review rejection rate (expect 0-10% for recent threads, higher for older)
- Spot-check rejection reasons for appropriateness
- Stop and alert user if: rejection rate is surprisingly high/low, unclear rejection reasons, or format issues
Step 5: Aggregate Validation
python3 scripts/aggregate_validation.py {identifier}
Updates insights.yaml with validation results and outputs topic assignment prompt.
After aggregation, validate:
- Verify insights.yaml statuses updated (pending → validated or rejected)
- Check all pending insights were processed
- Review rejection reasons if any
- Stop and alert user if: missing updates, unexpected rejection patterns
Step 6: Launch Topic Assignment
Launch the topic assignment Task prompt(s) in parallel.
Output: tmp/mining_{identifier}/topics/batch_{n}.yaml
After subagents complete:
- Read all
topics/batch_{n}.yamloutputs - Merge all
assignmentslists into onetopics.yamlin the working directory - Deduplicate
__new__:topics: same name across batches → keep as-is (natural merge). Similar but differently-named proposals → flag to user for resolution. - Verify all insight_ids were assigned, check topic distribution is reasonable
- Stop and alert user if: many new topics proposed, odd distribution, or missing assignments
Step 7: Dispatch Synthesis
python3 scripts/dispatch_synthesis.py {identifier}
Applies topic assignments and outputs synthesis Task prompts (one per topic).
After dispatch, validate:
- Verify insights.yaml was updated with topic assignments
- Check all validated insights have topics
- Review new topic files were created (for
__new__:topics) - Stop and alert user if: assignments missing, too many new topics, or odd groupings
Step 8: Launch Synthesis Subagents
Launch synthesis Task prompts in parallel using the Task tool (one per topic).
Output: Updates library/{topic}.yaml directly.
After subagents complete, validate:
- Check each topic's library file was updated
- Verify YAML format is correct
- Review subagent summaries (preserved/updated/added counts)
- Spot-check 1-2 updated practices for quality
- Stop and alert user if: files weren't updated, format errors, or suspicious changes
Step 9: Verify Synthesis Quality
Check that:
- Existing practices were preserved appropriately
- New practices are well-written and actionable
- One-off patterns were filtered (not everything became a practice)
- Code examples are correct and follow codebase conventions
Stop and alert user if: practices were deleted without replacement, excessive additions, or empty library files.
Step 10: Aggregate Synthesis
python3 scripts/aggregate_synthesis.py {identifier}
Marks all validated insights with topics as synthesized.
Step 11: Build
python3 scripts/build_sections.py
Generates markdown files for the review skill.
Output: the configured sections_output_dir
Step 12: Verify
python3 scripts/mine.py status
python3 scripts/mine.py pending
Confirm:
statusshows insights assynthesizedpendingshows no remaining work
Step 13: Build Review Rules
python3 scripts/build_bugbot.py
Produces Task prompts for generating bugbot rules from the library. Launch the Task prompts (one per scope). Each subagent reads the existing BUGBOT.md and library practices, then merges incrementally — adding rules for new practices, removing rules for deleted practices, and preserving unchanged rules verbatim.
Sections use ## {topic} headings (matching library filenames) with **{practice_title}** rule keys. Related practices are synthesized into fewer condensed rules.
Targets: Scope-specific rules files from config.yaml
After subagents complete, verify:
- Diff is minimal — only new/removed/updated rules, not full rewrites
- New rules are mechanical and actionable (not vague design guidance)
- No duplication with root
.cursor/BUGBOT.md(manually maintained cross-cutting rules)
Status Commands
python3 scripts/mine.py status # Overview: threads, insights, library
python3 scripts/mine.py pending # What needs work at each stage
python3 scripts/mine.py for-topic X # All insights for topic X
Data Locations
- Threads:
code_insights/threads.yaml - Insights:
code_insights/insights.yaml - Library:
code_insights/library/*.yaml - Working dir:
tmp/mining_{identifier}/
Architecture
User: /mine-best-practices --since 2024-01-01
|
v
mine.py --> Batch threads, output extraction prompts
|
v
Extraction subagents (parallel) --> batch_n.yaml
|
v
aggregate_extraction.py --> insights.yaml + validation prompts
|
v
Validation subagents (parallel) --> batch_n.yaml
|
v
aggregate_validation.py --> insights.yaml + topic prompt
|
v
Topic assignment subagent --> topics.yaml
|
v
dispatch_synthesis.py --> synthesis prompts (per topic)
|
v
Synthesis subagents (parallel) --> library/{topic}.yaml
|
v
[VERIFY: Check for anomalies]
|
v
aggregate_synthesis.py --> insights.yaml (status: synthesized)
|
v
build_sections.py --> sections/*.md
|
v
build_bugbot.py --> bugbot rules (via subagent)
Notes
- Extraction filters out already-processed thread_ids
- Validation checks patterns against current codebase
- Synthesis prioritizes recurring patterns over one-offs
- Library practices derive from
insights.yaml(full provenance)