name: oma-image description: Multi-vendor AI image generation with authentication-aware parallel dispatch. Routes to Codex (gpt-image-2 via ChatGPT OAuth) and Pollinations (flux/zimage, free with signup). Gemini provider is present but disabled by default (requires billing). Use for image generation, image creation, visual asset generation, and AI art.

Image Agent - Multi-Vendor Image Router

Scheduling

Goal

Generate images and visual assets through authenticated multi-vendor routing while preserving prompt clarity, reference-image handling, cost controls, and reproducible output manifests.

Intent signature

User asks to generate images, visual assets, illustrations, product photos, concept art, mockups, or AI art.
Another skill needs shared image-generation infrastructure.
User provides reference images or asks for vendor comparison.

When to use

Generating images, visual assets, illustrations, product photos, concept art
Comparing output between multiple image models for the same prompt
Producing images from prompts within editor workflows (Claude Code, Codex, Gemini CLI)
Other skills needing image generation infrastructure (shared invocation)

When NOT to use

Editing an existing image or photo manipulation -> out of scope
Generating videos or audio -> out of scope
Inline vector art / SVG composition from structured data -> use a templating skill
Simple asset resizing or format conversion -> use a dedicated image library

Expected inputs

Image prompt or creative brief
Optional vendor, size, quality, count, output directory, and reference images
Authentication/environment state for Codex, Pollinations, or Gemini

Expected outputs

Generated image files under .agents/results/images/ or requested output directory
manifest.json with prompt, vendor, model, and reproducibility metadata
Vendor comparison outputs when --vendor all is used

Dependencies

oma image generate CLI and vendor authentication
Codex image generation, Pollinations API, or Gemini API/CLI strategy
resources/vendor-matrix.md, resources/prompt-tips.md, and config/image-config.yaml

Control-flow features

Branches by prompt ambiguity, vendor auth, cost threshold, reference-image support, path safety, and safety/timeout exit codes
Calls external vendor APIs/CLIs
Reads reference images and writes generated images plus manifests

Structural Flow

Entry

Validate that the request contains enough subject, setting, style, usage, and aspect-ratio signal.
Detect attached/reference images and vendor support.
Check authentication, cost guardrails, output path, and count limits.

Scenes

PREPARE: Clarify or amplify prompt and choose vendor strategy.
ACQUIRE: Validate auth, references, output path, and provider availability.
ACT: Invoke oma image generate with selected vendor(s), prompt, references, and options.
VERIFY: Check manifest, output files, exit code, and provider result.
FINALIZE: Return output paths and relevant warnings.

Transitions

If prompt lacks required signal, clarify or show amplified prompt before generation.
If --vendor all is requested, require every requested vendor to be available.
If reference path is supported by selected vendor, pass it automatically.
If estimated cost exceeds guardrail, require confirmation unless bypassed.

Failure and recovery

If auth is missing, report vendor-specific authentication requirement.
If reference support is unavailable for the selected vendor, reject with actionable guidance.
If local CLI is outdated, ask user to run oma update.
If generation times out or is blocked, surface exit code and provider status.

Exit

Success: images and manifest exist in the output directory.
Partial success: some vendors fail in comparison mode and failures are reported.
Failure: no image is produced and the route/cost/auth/safety blocker is explicit.

Logical Operations

Actions

Action	SSL primitive	Evidence
Validate prompt completeness	`VALIDATE`	Clarification protocol
Select vendor strategy	`SELECT`	Vendor matrix and auth state
Read reference images	`READ`	`--reference` paths
Call generation CLI/API	`CALL_TOOL`	`oma image generate`
Write image outputs	`WRITE`	Image files and manifest
Validate result	`VALIDATE`	Exit code, manifest, files
Report output	`NOTIFY`	Final path summary

Tools and instruments

oma image generate, oma image doctor, oma image list-vendors
Codex, Pollinations, and Gemini provider paths
Prompt tips, vendor matrix, and image config

Canonical command path

oma image doctor
oma image generate "<prompt>" --vendor auto --size auto --quality auto --format json

With reference images:

oma image generate --reference "<absolute-path>" --vendor codex "<prompt>"

Resource scope

Scope	Resource target
`LOCAL_FS`	Reference images, generated images, manifests
`PROCESS`	Provider CLIs and image router commands
`NETWORK`	Pollinations/Gemini or provider APIs
`CREDENTIALS`	Provider auth and API keys

Preconditions

Prompt is sufficiently specified or user approves amplification.
Required vendor auth and output permissions exist.
Reference paths are accessible when used.

Effects and side effects

Creates image files and manifests.
May call paid or rate-limited provider APIs.
May read attached/reference images.

Guardrails

Clarify before invoking — if the user's request is ambiguous about subject, style, composition, or usage context, ask the user first or amplify the prompt explicitly (showing the user the expanded version for approval). Do NOT silently generate from a vague prompt. See Clarification Protocol below.
Authentication-aware dispatch — detect which vendor CLIs are authenticated and run only those; with --vendor all, every requested vendor must be available (strict).
Cost guardrail — confirm before executing runs whose estimated cost is ≥ $0.20 (configurable). --yes / OMA_IMAGE_YES=1 bypass. Default vendor pollinations (flux/zimage) is free, so auto-triggering on keywords is safe.
Path safety — output paths outside $PWD require --allow-external-out.
Cancellable — SIGINT/SIGTERM aborts in-flight provider calls and the orchestrator.
Deterministic outputs — every run writes manifest.json next to the images for reproducibility.
Max n = 5 — wall-time bound.
Exit codes align with oma search fetch (0, 1, 2=safety, 3=not-found, 4=invalid-input, 5=auth-required, 6=timeout).

Clarification Protocol

Before invoking oma image generate, the calling agent runs this checklist against the user's request. If any answer is "no / unknown", clarify with the user first.

Required signal (must be present or inferable):

Subject — what is the primary thing in the image? (object, person, scene)
Setting / backdrop — where is it? (context, environment)

Strongly recommended (ask if absent AND not inferable from context):

Style — photorealistic, illustration, 3D render, oil painting, concept art, flat vector, …?
Mood / lighting — bright vs moody, warm vs cool, dramatic vs minimal
Usage context — hero image, icon, thumbnail, product shot, poster? (dictates aspect ratio + composition)
Aspect ratio — square (1024x1024), portrait (1024x1536), landscape (1536x1024)?

Amplification shortcut. For brief prompts (e.g. "a red apple"), do not pop clarifying questions if the request is genuinely that simple — instead amplify inline and show the user the expanded version before invoking:

User: "a red apple" Agent: "I'll generate this as: a single glossy red apple centered on a clean white background, soft studio lighting, photorealistic, shallow depth of field, 1024×1024. Shall I proceed, or would you like a different style/composition?"

Skip both clarification and amplification when the user has clearly authored a full creative brief (≥ 2 of: subject + style + lighting + composition). Respect their prompt verbatim.

Category-specific briefs (app mockup, poster, thumbnail, infographic, comic panel, avatar): consult resources/prompt-tips.md → External Prompt Libraries.

Output language. Generation prompts are sent to the provider in English (image models are trained predominantly on English captions). Translate the user's request if they wrote in another language, and show them the translated version during amplification so they can correct misreadings.

Vendors

This skill follows oh-my-agent's CLI-first concept: whenever a vendor's native CLI can drive generation (and return raw bytes), the subprocess path is preferred over direct API keys. Direct API is only used as a fallback for vendors whose CLI can't yet emit raw image bytes.

Vendor	Strategy	Models	Trigger
`codex`	CLI-first — `codex exec` via ChatGPT OAuth (`codex login`), built-in `image_gen`	`gpt-image-2`	Logged in via Codex CLI (no API key)
`pollinations`	Direct HTTP — `gen.pollinations.ai/v1/images/generations` (free signup for key)	Free: `flux`, `zimage`. Credit-gated: `qwen-image`, `wan-image`, `gpt-image-2`, `klein`, `kontext`, `gptimage`, `gptimage-large`	`POLLINATIONS_API_KEY` set (free at https://enter.pollinations.ai). No native CLI exists.
`gemini`	CLI-first fallback → direct API. `gemini -p` (stream) is the preferred path but currently disabled at precheck (CLI's agentic loop does not return raw `inlineData` bytes on stdout as of Gemini CLI 0.38). Until the CLI exposes a non-agentic image surface, the provider falls back to the direct `generativelanguage.googleapis.com` API.	`gemini-2.5-flash-image`, `gemini-3.1-flash-image-preview`	Preferred: `gemini auth login`. Fallback: `GEMINI_API_KEY` + billing.

Invocation

Standalone

/oma-image a red apple on white background
/oma-image --vendor all --size 1536x1024 jeju coastline at sunset
/oma-image -n 3 --quality high --out ./hero "minimalist dashboard hero illustration"

Shell CLI

oma image generate "<prompt>" [--vendor auto|codex|pollinations|gemini|all] [-n 1..5] \
                             [--size 1024x1024|1024x1536|1536x1024|auto] \
                             [--quality low|medium|high|auto] \
                             [--out <dir>] [--allow-external-out] \
                             [-r <path>]... \
                             [--timeout 180] [-y] [--no-prompt-in-manifest] \
                             [--dry-run] [--format text|json]
oma image doctor
oma image list-vendors

Gemini-only escalation flag: --strategy mcp,stream,api (overrides vendors.gemini.strategies).

Reference Images (`-r`, `--reference`)

Attach up to 10 reference images (PNG/JPEG/GIF/WebP, ≤ 5MB each) to guide style, subject identity, or composition. Repeatable or comma-separated.

oma image generate -r ~/Downloads/otter.jpeg "same otter in dramatic lighting"
oma image generate -r a.png -r b.png "blend these two styles"

Supported vendors:

Vendor	Support	How
`codex` (gpt-image-2)	✅	Passes `-i <path>` to `codex exec`
`gemini` (2.5-flash-image)	✅	Inlines base64 `inlineData` parts in request
`pollinations`	❌	Rejected with exit code 4 (requires URL hosting; see PR #2 roadmap)

Paths: absolute or relative to $CWD. Host CLIs usually expose attached images via:

Claude Code: ~/.claude/image-cache/<session>/N.png (surfaced in system messages as [Image: source: <path>])
Antigravity: workspace upload directory (exact path shown in IDE)
Codex CLI as host: user must pass the filesystem path explicitly; in-conversation attachments are not forwarded

Agent Behavior: Auto-forward Attached References (MANDATORY)

When ALL of the following are true, the calling agent MUST pass the attached image via --reference <path> automatically. Never describe the image in prose as a workaround.

The user asks to generate or edit an image (referencing the attached one by phrases like "이거", "this image", "same style as this", "이 수달", etc.).
A host-surfaced attached image is visible to the agent — e.g. a Claude Code system message with [Image: source: <path>], or an Antigravity workspace upload path, or an explicit filesystem path in the user's message.
The selected vendor supports references (codex or gemini).

Required action: invoke oma image generate --reference <absolute-path> --vendor <codex|gemini> "<prompt>". If the user didn't specify a vendor, default to codex (CLI-first, widest availability). Do NOT:

Fall back to prose description ("I'll describe the otter's appearance...").
Ask the user to re-type or re-attach the path.
Claim the CLI doesn't support references without first running oma image generate --help to verify.

If the local CLI is outdated (--reference is missing from --help): tell the user to run oma update once, then retry. Do not silently degrade to prose.

If the reference path is from Claude Code's image-cache: note to the user that the path is session-scoped and suggest copying the file to a durable location if they want to reuse it later. Still proceed with the generation.

Shared Infrastructure (from other skills)

Other skills call oma image generate --format json and parse the JSON manifest from stdout.

Output Layout

.agents/results/images/
├── 20260424-143052-ab12cd/                    # single-vendor run
│   └── pollinations-flux.jpg
│       (or codex-gpt-image-2.png)
│       manifest.json
└── 20260424-143122-7z9kqw-compare/            # --vendor all run
    ├── codex-gpt-image-2.png
    ├── pollinations-flux.jpg
    └── manifest.json

References

Follow resources/execution-protocol.md step by step. See resources/vendor-matrix.md for strategy precheck rules. Use resources/prompt-tips.md for writing effective prompts. Before submitting, run resources/checklist.md.

Configuration

Project-specific settings: config/image-config.yaml. Env vars: OMA_IMAGE_DEFAULT_VENDOR, OMA_IMAGE_DEFAULT_OUT, OMA_IMAGE_YES, POLLINATIONS_API_KEY, GEMINI_API_KEY, OMA_IMAGE_GEMINI_STRATEGIES.

Execution steps: resources/execution-protocol.md
Vendor matrix: resources/vendor-matrix.md
Prompt tips: resources/prompt-tips.md
Checklist: resources/checklist.md
Context loading: ../_shared/core/context-loading.md

ナビゲーション

Skillsとは？

リンク

oma-image

Image Agent - Multi-Vendor Image Router

Scheduling

Goal

Intent signature

When to use

When NOT to use

Expected inputs

Expected outputs

Dependencies

Control-flow features

Structural Flow

Entry

Scenes

Transitions

Failure and recovery

Exit

Logical Operations

Actions

Tools and instruments

Canonical command path

Resource scope

Preconditions

Effects and side effects

Guardrails

Clarification Protocol

Vendors

Invocation

Standalone

Shell CLI

Reference Images (`-r`, `--reference`)

Agent Behavior: Auto-forward Attached References (MANDATORY)

Shared Infrastructure (from other skills)

Output Layout

References

Configuration

関連スキル(🌐 Web開発)

ナビゲーション

Skillsとは？

リンク

oma-image

Image Agent - Multi-Vendor Image Router

Scheduling

Goal

Intent signature

When to use

When NOT to use

Expected inputs

Expected outputs

Dependencies

Control-flow features

Structural Flow

Entry

Scenes

Transitions

Failure and recovery

Exit

Logical Operations

Actions

Tools and instruments

Canonical command path

Resource scope

Preconditions

Effects and side effects

Guardrails

Clarification Protocol

Vendors

Invocation

Standalone

Shell CLI

Reference Images (-r, --reference)

Agent Behavior: Auto-forward Attached References (MANDATORY)

Shared Infrastructure (from other skills)

Output Layout

References

Configuration

関連スキル(🌐 Web開発)

Reference Images (`-r`, `--reference`)