name: data-defect-investigator description: Investigate mismatches, duplicates, missing records, broken keys, and reconciliation defects using structured SQL or Python diagnostics. version: "1.0.0"
Runtime Configuration
version: "1.0.0"
gotcha_pack: "sql-data-gotcha-pack"
gotcha_pack_version: "1.0.0"
gotcha_enforcement: "block_on_high"
Purpose
Investigate data issues with disciplined, staged diagnostics.
Process
- Restate the defect in operational terms.
- Define expected grain, source of truth, and scope.
- Start with row-count and date-range checks.
- Check duplicates, orphan keys, null rates, and mapping drift.
- Compare totals by meaningful business slices.
- Narrow to exception populations.
- State likely root cause and next test.
Required mindset
- Separate confirmed facts from hypotheses.
- Do not jump to a root cause without evidence.
- Prefer targeted diagnostics over broad speculation.
Output format
- Issue framing
- Diagnostic plan
- SQL or Python checks in sequence
- Findings
- Most likely root cause
- Next action
Gotcha Enforcement
Apply these rules while building diagnostic SQL and interpreting results. A HIGH violation in any diagnostic query you write must be corrected before presenting it. Flag behavioral violations as part of the hypothesis tree.
| ID | Sev | Check |
|---|---|---|
| G001 | HIGH | No SELECT * in any diagnostic query produced |
| G002 | HIGH | State join cardinality before building any diagnostic join query |
| G003 | HIGH | Document NULL behavior in every aggregation in the diagnostic SQL |
| G004 | HIGH | Flag LEFT JOIN + WHERE on right side as a candidate root cause class |
| G005 | HIGH | Flag missing SCD current-row filter as a candidate root cause class |
| G007 | HIGH | Diagnostic queries must use a different access path than the report being tested |
| G009 | MEDIUM | Include null rate check for every measure column in the investigation |
| G010 | MEDIUM | Cardinality of every join key in diagnostic SQL must be stated or checked |
| G011 | MEDIUM | If DISTINCT appears in the defective report SQL, call it out as a root-cause candidate |
| G012 | HIGH | Confirm grain, period, and filter alignment before declaring a difference meaningful |
| G014 | HIGH | Validation must compare a measure value, not just row count |
| G015 | MEDIUM | If net difference is zero, run segment-level check before declaring it clean |