name: oma-image description: Multi-vendor AI image generation with authentication-aware parallel dispatch. Routes to Codex (gpt-image-2 via ChatGPT OAuth) and Pollinations (flux/zimage, free with signup). Gemini provider is present but disabled by default (requires billing). Use for image generation, image creation, visual asset generation, and AI art.
Image Agent - Multi-Vendor Image Router
Scheduling
Goal
Generate images and visual assets through authenticated multi-vendor routing while preserving prompt clarity, reference-image handling, cost controls, and reproducible output manifests.
Intent signature
- User asks to generate images, visual assets, illustrations, product photos, concept art, mockups, or AI art.
- Another skill needs shared image-generation infrastructure.
- User provides reference images or asks for vendor comparison.
When to use
- Generating images, visual assets, illustrations, product photos, concept art
- Comparing output between multiple image models for the same prompt
- Producing images from prompts within editor workflows (Claude Code, Codex, Gemini CLI)
- Other skills needing image generation infrastructure (shared invocation)
When NOT to use
- Editing an existing image or photo manipulation -> out of scope
- Generating videos or audio -> out of scope
- Inline vector art / SVG composition from structured data -> use a templating skill
- Simple asset resizing or format conversion -> use a dedicated image library
Expected inputs
- Image prompt or creative brief
- Optional vendor, size, quality, count, output directory, and reference images
- Authentication/environment state for Codex, Pollinations, or Gemini
Expected outputs
- Generated image files under
.agents/results/images/or requested output directory manifest.jsonwith prompt, vendor, model, and reproducibility metadata- Vendor comparison outputs when
--vendor allis used
Dependencies
oma image generateCLI and vendor authentication- Codex image generation, Pollinations API, or Gemini API/CLI strategy
resources/vendor-matrix.md,resources/prompt-tips.md, andconfig/image-config.yaml
Control-flow features
- Branches by prompt ambiguity, vendor auth, cost threshold, reference-image support, path safety, and safety/timeout exit codes
- Calls external vendor APIs/CLIs
- Reads reference images and writes generated images plus manifests
Structural Flow
Entry
- Validate that the request contains enough subject, setting, style, usage, and aspect-ratio signal.
- Detect attached/reference images and vendor support.
- Check authentication, cost guardrails, output path, and count limits.
Scenes
- PREPARE: Clarify or amplify prompt and choose vendor strategy.
- ACQUIRE: Validate auth, references, output path, and provider availability.
- ACT: Invoke
oma image generatewith selected vendor(s), prompt, references, and options. - VERIFY: Check manifest, output files, exit code, and provider result.
- FINALIZE: Return output paths and relevant warnings.
Transitions
- If prompt lacks required signal, clarify or show amplified prompt before generation.
- If
--vendor allis requested, require every requested vendor to be available. - If reference path is supported by selected vendor, pass it automatically.
- If estimated cost exceeds guardrail, require confirmation unless bypassed.
Failure and recovery
- If auth is missing, report vendor-specific authentication requirement.
- If reference support is unavailable for the selected vendor, reject with actionable guidance.
- If local CLI is outdated, ask user to run
oma update. - If generation times out or is blocked, surface exit code and provider status.
Exit
- Success: images and manifest exist in the output directory.
- Partial success: some vendors fail in comparison mode and failures are reported.
- Failure: no image is produced and the route/cost/auth/safety blocker is explicit.
Logical Operations
Actions
| Action | SSL primitive | Evidence |
|---|---|---|
| Validate prompt completeness | VALIDATE | Clarification protocol |
| Select vendor strategy | SELECT | Vendor matrix and auth state |
| Read reference images | READ | --reference paths |
| Call generation CLI/API | CALL_TOOL | oma image generate |
| Write image outputs | WRITE | Image files and manifest |
| Validate result | VALIDATE | Exit code, manifest, files |
| Report output | NOTIFY | Final path summary |
Tools and instruments
oma image generate,oma image doctor,oma image list-vendors- Codex, Pollinations, and Gemini provider paths
- Prompt tips, vendor matrix, and image config
Canonical command path
oma image doctor
oma image generate "<prompt>" --vendor auto --size auto --quality auto --format json
With reference images:
oma image generate --reference "<absolute-path>" --vendor codex "<prompt>"
Resource scope
| Scope | Resource target |
|---|---|
LOCAL_FS | Reference images, generated images, manifests |
PROCESS | Provider CLIs and image router commands |
NETWORK | Pollinations/Gemini or provider APIs |
CREDENTIALS | Provider auth and API keys |
Preconditions
- Prompt is sufficiently specified or user approves amplification.
- Required vendor auth and output permissions exist.
- Reference paths are accessible when used.
Effects and side effects
- Creates image files and manifests.
- May call paid or rate-limited provider APIs.
- May read attached/reference images.
Guardrails
- Clarify before invoking — if the user's request is ambiguous about subject, style, composition, or usage context, ask the user first or amplify the prompt explicitly (showing the user the expanded version for approval). Do NOT silently generate from a vague prompt. See
Clarification Protocolbelow. - Authentication-aware dispatch — detect which vendor CLIs are authenticated and run only those; with
--vendor all, every requested vendor must be available (strict). - Cost guardrail — confirm before executing runs whose estimated cost is ≥
$0.20(configurable).--yes/OMA_IMAGE_YES=1bypass. Default vendorpollinations(flux/zimage) is free, so auto-triggering on keywords is safe. - Path safety — output paths outside
$PWDrequire--allow-external-out. - Cancellable — SIGINT/SIGTERM aborts in-flight provider calls and the orchestrator.
- Deterministic outputs — every run writes
manifest.jsonnext to the images for reproducibility. - Max
n= 5 — wall-time bound. - Exit codes align with
oma search fetch(0, 1, 2=safety, 3=not-found, 4=invalid-input, 5=auth-required, 6=timeout).
Clarification Protocol
Before invoking oma image generate, the calling agent runs this checklist against the user's request. If any answer is "no / unknown", clarify with the user first.
Required signal (must be present or inferable):
- Subject — what is the primary thing in the image? (object, person, scene)
- Setting / backdrop — where is it? (context, environment)
Strongly recommended (ask if absent AND not inferable from context):
- Style — photorealistic, illustration, 3D render, oil painting, concept art, flat vector, …?
- Mood / lighting — bright vs moody, warm vs cool, dramatic vs minimal
- Usage context — hero image, icon, thumbnail, product shot, poster? (dictates aspect ratio + composition)
- Aspect ratio — square (
1024x1024), portrait (1024x1536), landscape (1536x1024)?
Amplification shortcut. For brief prompts (e.g. "a red apple"), do not pop clarifying questions if the request is genuinely that simple — instead amplify inline and show the user the expanded version before invoking:
User: "a red apple" Agent: "I'll generate this as: a single glossy red apple centered on a clean white background, soft studio lighting, photorealistic, shallow depth of field, 1024×1024. Shall I proceed, or would you like a different style/composition?"
Skip both clarification and amplification when the user has clearly authored a full creative brief (≥ 2 of: subject + style + lighting + composition). Respect their prompt verbatim.
Category-specific briefs (app mockup, poster, thumbnail, infographic, comic panel, avatar): consult resources/prompt-tips.md → External Prompt Libraries.
Output language. Generation prompts are sent to the provider in English (image models are trained predominantly on English captions). Translate the user's request if they wrote in another language, and show them the translated version during amplification so they can correct misreadings.
Vendors
This skill follows oh-my-agent's CLI-first concept: whenever a vendor's native CLI can drive generation (and return raw bytes), the subprocess path is preferred over direct API keys. Direct API is only used as a fallback for vendors whose CLI can't yet emit raw image bytes.
| Vendor | Strategy | Models | Trigger |
|---|---|---|---|
codex | CLI-first — codex exec via ChatGPT OAuth (codex login), built-in image_gen | gpt-image-2 | Logged in via Codex CLI (no API key) |
pollinations | Direct HTTP — gen.pollinations.ai/v1/images/generations (free signup for key) | Free: flux, zimage. Credit-gated: qwen-image, wan-image, gpt-image-2, klein, kontext, gptimage, gptimage-large | POLLINATIONS_API_KEY set (free at https://enter.pollinations.ai). No native CLI exists. |
gemini | CLI-first fallback → direct API. gemini -p (stream) is the preferred path but currently disabled at precheck (CLI's agentic loop does not return raw inlineData bytes on stdout as of Gemini CLI 0.38). Until the CLI exposes a non-agentic image surface, the provider falls back to the direct generativelanguage.googleapis.com API. | gemini-2.5-flash-image, gemini-3.1-flash-image-preview | Preferred: gemini auth login. Fallback: GEMINI_API_KEY + billing. |
Invocation
Standalone
/oma-image a red apple on white background
/oma-image --vendor all --size 1536x1024 jeju coastline at sunset
/oma-image -n 3 --quality high --out ./hero "minimalist dashboard hero illustration"
Shell CLI
oma image generate "<prompt>" [--vendor auto|codex|pollinations|gemini|all] [-n 1..5] \
[--size 1024x1024|1024x1536|1536x1024|auto] \
[--quality low|medium|high|auto] \
[--out <dir>] [--allow-external-out] \
[-r <path>]... \
[--timeout 180] [-y] [--no-prompt-in-manifest] \
[--dry-run] [--format text|json]
oma image doctor
oma image list-vendors
Gemini-only escalation flag: --strategy mcp,stream,api (overrides vendors.gemini.strategies).
Reference Images (-r, --reference)
Attach up to 10 reference images (PNG/JPEG/GIF/WebP, ≤ 5MB each) to guide style, subject identity, or composition. Repeatable or comma-separated.
oma image generate -r ~/Downloads/otter.jpeg "same otter in dramatic lighting"
oma image generate -r a.png -r b.png "blend these two styles"
Supported vendors:
| Vendor | Support | How |
|---|---|---|
codex (gpt-image-2) | ✅ | Passes -i <path> to codex exec |
gemini (2.5-flash-image) | ✅ | Inlines base64 inlineData parts in request |
pollinations | ❌ | Rejected with exit code 4 (requires URL hosting; see PR #2 roadmap) |
Paths: absolute or relative to $CWD. Host CLIs usually expose attached images via:
- Claude Code:
~/.claude/image-cache/<session>/N.png(surfaced in system messages as[Image: source: <path>]) - Antigravity: workspace upload directory (exact path shown in IDE)
- Codex CLI as host: user must pass the filesystem path explicitly; in-conversation attachments are not forwarded
Agent Behavior: Auto-forward Attached References (MANDATORY)
When ALL of the following are true, the calling agent MUST pass the attached image via --reference <path> automatically. Never describe the image in prose as a workaround.
- The user asks to generate or edit an image (referencing the attached one by phrases like "이거", "this image", "same style as this", "이 수달", etc.).
- A host-surfaced attached image is visible to the agent — e.g. a Claude Code system message with
[Image: source: <path>], or an Antigravity workspace upload path, or an explicit filesystem path in the user's message. - The selected vendor supports references (
codexorgemini).
Required action: invoke oma image generate --reference <absolute-path> --vendor <codex|gemini> "<prompt>". If the user didn't specify a vendor, default to codex (CLI-first, widest availability). Do NOT:
- Fall back to prose description ("I'll describe the otter's appearance...").
- Ask the user to re-type or re-attach the path.
- Claim the CLI doesn't support references without first running
oma image generate --helpto verify.
If the local CLI is outdated (--reference is missing from --help): tell the user to run oma update once, then retry. Do not silently degrade to prose.
If the reference path is from Claude Code's image-cache: note to the user that the path is session-scoped and suggest copying the file to a durable location if they want to reuse it later. Still proceed with the generation.
Shared Infrastructure (from other skills)
Other skills call oma image generate --format json and parse the JSON manifest from stdout.
Output Layout
.agents/results/images/
├── 20260424-143052-ab12cd/ # single-vendor run
│ └── pollinations-flux.jpg
│ (or codex-gpt-image-2.png)
│ manifest.json
└── 20260424-143122-7z9kqw-compare/ # --vendor all run
├── codex-gpt-image-2.png
├── pollinations-flux.jpg
└── manifest.json
References
Follow resources/execution-protocol.md step by step.
See resources/vendor-matrix.md for strategy precheck rules.
Use resources/prompt-tips.md for writing effective prompts.
Before submitting, run resources/checklist.md.
Configuration
Project-specific settings: config/image-config.yaml.
Env vars: OMA_IMAGE_DEFAULT_VENDOR, OMA_IMAGE_DEFAULT_OUT, OMA_IMAGE_YES, POLLINATIONS_API_KEY, GEMINI_API_KEY, OMA_IMAGE_GEMINI_STRATEGIES.
- Execution steps:
resources/execution-protocol.md - Vendor matrix:
resources/vendor-matrix.md - Prompt tips:
resources/prompt-tips.md - Checklist:
resources/checklist.md - Context loading:
../_shared/core/context-loading.md