name: bug-clustering description: | Internal process for the bug-clusterer agent. Defines the step-by-step procedure for parsing, classifying, redacting, scoring, and clustering bug candidates from raw X/Twitter posts. Not user-invocable — loaded by the bug-clusterer agent through its skills frontmatter. allowed-tools: "Read, Bash(cat:*), Grep, Glob" user-invocable: false version: 0.1.0 author: Jeremy Longshore jeremy@intentsolutions.io license: SEE LICENSE IN LICENSE model: inherit effort: high compatibility: "Designed for Claude Code; internal agent-loaded skill (not user-invocable)" tags: [triage, clustering, pii-redaction, classification, internal-agent-skill]
Bug Clustering Process
Step-by-step procedure for transforming raw XPost objects into structured, clustered bug candidates with PII redaction and reliability scoring.
Overview
Loaded by the bug-clusterer agent inside the x-bug-triage plugin. Transforms raw XPost records ingested from X/Twitter into structured BugCandidate rows, deduplicates near-identical candidates, classifies each candidate into one of 12 bug families, redacts six categories of PII, scores reporter reliability across four dimensions, and groups results into bug clusters by deterministic signature. The output feeds the downstream repo-scanning, owner-routing, and triage-display stages.
Prerequisites
- Database initialized via
lib/db.tsschema migrations config/cluster-matching-thresholds.jsonpresent (controls dedup + cluster overlap thresholds)config/approved-accounts.jsonpresent (drives reporter category tagging)- Raw
XPost[]array passed in by the orchestrator (not fetched here)
Instructions
Step 1: Parse
For each XPost, produce a BugCandidate with all 33 fields using lib/parser.ts:
- Extract product_surface, feature_area, symptoms, error_strings, repro_hints
- Extract urls, media_keys, language, conversation references
- Determine source_type (mention, reply, quote_post, search_hit)
Step 1.5: Deduplicate
Before classification, run content-similarity deduplication using lib/dedupe.ts:
- Call
deduplicateCandidates()with parsed candidates and thecandidate_dedup.hybrid_similarity_thresholdfromconfig/cluster-matching-thresholds.json(default 0.70) - Uses char-trigram + token-Jaccard hybrid similarity
- Does NOT remove posts — tags them as duplicate groups with a canonical post (highest engagement)
- Only canonical posts and non-duplicates (
forward_ids) proceed to classification - Log dedup stats in the form: "N posts (M unique, K duplicate groups)" with N/M/K replaced by integer counts
Step 2: Classify
Run lib/classifier.ts on each candidate:
- Assign one of 12 classifications with confidence score (0.0-1.0) and rationale
- Sarcastic bug reports get classified separately — still treated as signal
Step 3: Redact PII
Run lib/redactor.ts on each candidate:
- Detect 6 PII types: email, API key, phone, account ID, media flag, URL token
- Replace with [REDACTED:type] tags
- Set pii_flags array and raw_text_storage_policy
Step 4: Score Reliability
Run lib/reporter-scorer.ts on each candidate:
- 4 dimensions: report quality, independence, account authenticity, historical accuracy
- Composite reporter_reliability_score (0.0-1.0)
Step 5: Tag Reporter Category
Match author against approved_accounts config:
- Categories: public, internal, partner, tester
Step 6: Cluster
Using lib/clusterer.ts and lib/signatures.ts:
- Generate deterministic bug signature from error_strings + symptoms + feature_area
- Match against active_clusters at >=70% signature overlap
- Family-first guard: different ClusterFamilies NEVER cluster together
- New match: create cluster (initial severity "low")
- Existing match: update report_count, last_seen, sub_status
- Resolved match: reopen with sub_status "regression_reopened"
- Suppressed match: skip, log to audit
Step 7: Persist
- Insert candidates to DB via
lib/db.ts - Insert/update clusters and cluster_posts junction
- Write audit events for each classification, redaction, and cluster action
Output
bug_candidatesrows with classification, PII flags, and reporter reliability scoresbug_clustersrows (new or updated) with severity, sub_status, and report_countcluster_postsjunction rows linking candidates to clusters- Audit-event rows recording every classification, redaction, and cluster action
Error Handling
- Parser failure on a single XPost: log + skip, continue with remaining posts (degraded mode)
- Classification confidence below threshold: still recorded, flagged for human review
- Cluster signature collision across families: hard-blocked by the family-first guard — never cross-clusters
- Persist failure mid-batch: rollback uncommitted writes for that batch only; preserve already-committed work
Examples
The bug-clusterer agent invokes this skill after the X/Twitter ingest phase completes. A typical batch processes 50–500 candidates per run, producing 5–30 clusters depending on overlap. Sample audit log line: "127 posts (89 unique, 14 duplicate groups, 6 new clusters, 8 cluster updates)".
Resources
Load evidence tier definitions for proper cluster evidence assessment:
!cat skills/x-bug-triage/references/evidence-policy.md
Load data model reference for BugCandidate fields and cluster schemas:
!cat skills/x-bug-triage/references/schemas.md