name: image version: "1.0" description: Extract text from images using a vision LLM entry: script: scripts/main.py class: ImageSkill triggers: extensions: - ".png" - ".jpg" - ".jpeg" - ".webp" - ".gif" - ".tiff" intents: - "image" - "screenshot" - "diagram" - "photo" requires: [] author: axoviq.com license: AGPL-3.0-or-later
Image Skill
Base64-encodes the image and returns it in metadata for the IngestAgent to
process via a vision-capable LLM. The text field is left empty at extract
time and filled in by the agent.
When this skill is used
- Source path ends with
.png,.jpg,.jpeg,.webp,.gif, or.tiff - User intent contains:
image,screenshot,diagram,photo