name: vector-text-fixer description: Fix garbled text in PDF/SVG vector graphics caused by font encoding issues, making files editable in AI tools. Supports batch processing and JSON export for manual correction. license: MIT author: aipoch
Vector Text Fixer
Fixes garbled text in PDF/SVG vector graphics caused by font embedding problems, encoding errors, or missing font substitution. Outputs repaired files or editable JSON for AI tool import.
Quick Check
python -m py_compile scripts/main.py
Audit-Ready Commands
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --input document.pdf --output fixed.pdf
python scripts/main.py --input diagram.svg --output fixed.svg
When to Use
- Fix garbled/box characters in PDF files caused by font embedding issues
- Repair SVG text encoding errors before editing in Illustrator or Inkscape
- Batch-process a folder of PDF/SVG files with garbled text
- Export a text map JSON for manual correction in AI editors
Workflow
- Confirm input file path (PDF or SVG) or batch folder, and desired output path.
- Validate that the request involves PDF/SVG garbled text repair; stop early if not.
- Run
scripts/main.py --input <file> --output <file>or--batch <folder>. - Return a structured result separating repaired blocks, skipped blocks, and unresolved items.
- If execution fails or inputs are incomplete, switch to the Fallback Template below.
Fallback Template
If scripts/main.py fails or required fields are missing, respond with:
FALLBACK REPORT
───────────────────────────────────────
Objective : <repair goal>
Inputs Available : <file path or batch folder provided>
Missing Inputs : <list exactly what is missing>
Note: --input requires a valid PDF or SVG file path, not a text string.
For batch mode use --batch <folder_path> instead.
Partial Result : <any blocks repaired safely>
Blocked Steps : <what could not be completed and why>
Next Steps : <minimum info needed to complete>
───────────────────────────────────────
Stress-Case Output Checklist
For complex multi-constraint requests, always include these sections explicitly:
- Assumptions: repair level default (standard), encoding auto-detected
- Constraints: encrypted PDFs require password unlock first; scanned PDFs need OCR first
- Risks: severely damaged files may not be fully repairable; rare fonts may not map correctly
- Unresolved Items: blocks with confidence < 0.3 flagged for manual review
Supported Scenarios
PDF Garbled Text:
- Box/question mark issues from font embedding problems
- Garbled text from encoding conversion errors
- Missing font substitution characters
- Multi-language mixed encoding issues
SVG Garbled Text:
- Text entity encoding errors
- Special character escaping issues
- Invalid font reference display abnormalities
- XML encoding declaration errors
CLI Usage
# Fix single PDF
python scripts/main.py --input document.pdf --output fixed.pdf
# Fix single SVG
python scripts/main.py --input diagram.svg --output fixed.svg
# Batch process folder
python scripts/main.py --batch ./input_folder --output ./output_folder
# Interactive repair
python scripts/main.py --input doc.pdf --interactive
# Export editable JSON
python scripts/main.py --input doc.pdf --export-json editable.json
# Specify repair level
python scripts/main.py --input doc.pdf --output fixed.pdf --repair-level aggressive
Parameters
| Parameter | Required | Description | Default |
|---|---|---|---|
--input | Yes* | Input PDF or SVG file path | — |
--batch | Yes* | Batch input folder path | — |
--output | Yes | Output file or folder path | — |
--repair-level | No | minimal / standard / aggressive | standard |
--interactive | No | Enable interactive repair mode | False |
--export-json | No | Export editable JSON format | — |
--encoding | No | Source file encoding (default: auto-detect) | auto |
*At least one of --input or --batch is required.
Repair Levels
- Minimal: Only obvious errors (replacement characters, null bytes); maximum original integrity
- Standard: Common encoding issues + smart font replacement; balanced repair rate and accuracy
- Aggressive: Full text re-encoding + OCR-assisted recognition; for severely garbled documents
Output Format (JSON Export)
{
"file_type": "pdf",
"pages": [{
"page_num": 1,
"text_blocks": [{
"id": "tb_001",
"bbox": [100, 200, 300, 220],
"original_text": "?????",
"detected_encoding": "UTF-8",
"confidence": 0.3,
"suggested_fix": "Sample Text"
}]
}],
"repair_summary": {
"total_blocks": 15,
"fixed_blocks": 12,
"skipped_blocks": 3
}
}
Input Validation
This skill accepts: PDF (.pdf) or SVG (.svg) file paths, or a folder path for batch processing, where the files contain garbled or unreadable text caused by font/encoding issues.
If the request does not involve PDF/SVG garbled text repair — for example, asking to convert file formats, edit PDF content directly, perform OCR on scanned images, or process non-vector files — do not proceed. Instead respond:
"
vector-text-fixeris designed to fix garbled text in PDF/SVG vector graphics caused by font encoding issues. Your request appears to be outside this scope. Please provide a valid PDF or SVG file path, or use a more appropriate tool."
Error Handling
- If
--inputreceives a text string instead of a file path, report the error and request a valid file path. - If the file is encrypted, report that password unlock is required before processing.
- If the task goes outside documented scope, stop instead of guessing.
- If
scripts/main.pyfails, use the Fallback Template above. - Do not fabricate repaired text content or execution outcomes.
Output Requirements
Every final response must include:
- Objective — file(s) repaired and repair level used
- Inputs Received — file path, repair level, encoding settings
- Assumptions — defaults applied (repair level, encoding detection)
- Result — output file path, blocks fixed vs skipped
- Risks and Limits — confidence thresholds, manual review blocks
- Next Checks — review low-confidence blocks manually before use
Limitations
- Encrypted PDFs require password unlock before processing
- Severely damaged vector files may not be fully repairable
- Some rare fonts may not map correctly
- Scanned PDFs require OCR recognition first
Dependencies
pdfplumber >= 0.10.0
PyMuPDF >= 1.23.0
cairosvg >= 2.7.0
beautifulsoup4 >= 4.12.0
fonttools >= 4.40.0
chardet >= 5.0.0
Pillow >= 10.0.0