name: podcast-quality-repair description: Use when auditing or repairing a specific podcast with recurring transcript quality issues such as inconsistent speaker labels, generic guest names, language mismatches, duplicate translations, or episodes that need source recovery before repolish.
Podcast Quality Repair
Use this skill when one podcast has repeated transcript problems across many episodes and you need a fast, repeatable fix plan.
When to use
- One podcast has inconsistent host or guest labels across episodes.
llm_translaterows are duplicated.- An English podcast accidentally has Chinese
youtube_manualorllm_polishtranscripts. - A podcast still relies on legacy YouTube caption rows instead of ASR-based transcript sources.
- Speaker tags are embedded mid-paragraph, so a new speaker does not start a new paragraph.
- You need to decide which episodes only need normalization and which require source recovery or re-diarization.
Workflow
- Run the podcast audit:
node scripts/audit-podcast-transcripts.js --podcast-id=<id>
- Bucket episodes by repair mode:
- Label normalization only
- Duplicate translation cleanup
- Recover source transcript first
- Re-diarize or manually review speaker attribution
- Inline speaker split cleanup
- Replace legacy YouTube caption source with ASR
- Only after the audit, run the smallest safe fix:
- Tag cleanup and postprocess for formatting issues
- Inline-speaker split repair for paragraph-boundary mistakes
- Re-polish only when the source transcript is valid
- Recover source transcript before any repolish if the source language is wrong
- If only
youtube_autooryoutube_manualexists, regenerate from ASR before downstream fixes
- Re-run the audit after each batch until the issue counts drop to zero or to the remaining known hard cases.
Repair heuristics
- Canonicalize the host to the podcast host's real full name.
- Replace
Guestor嘉宾with the real guest name only when metadata makes the mapping unambiguous. - Treat
Host,Speaker 1,主持人, and嘉宾as red flags, not finished output. - Treat
**[Name]**appearing in the middle of a paragraph as a structural defect, not a style variant. - Treat
youtube_autoandyoutube_manualas temporary legacy imports, not preferred final sources. - For English podcasts,
llm_polishshould stay English. Chinese belongs inllm_translate. - If generic labels dominate and real names are absent, assume diarization or mapping may be wrong.
Read references/repair-playbook.md for the batch order and acceptance criteria.