name: gemini-image description: Invoke Google Gemini for image generation and understanding using the Python google-genai SDK. Supports gemini-3-pro-image-preview (generation + understanding), gemini-2.5-flash-image (fast generation), and vision models for analysis.
Gemini Image Skill
Invoke Google Gemini models for image generation, image understanding, and visual analysis using the Python google-genai SDK.
Available Models
| Model ID | Description | Best For | Output Format |
|---|---|---|---|
gemini-3-pro-image-preview | Best image generation + understanding | High-quality image gen, complex visual analysis | JPEG |
gemini-2.5-flash-image | Fast image generation | Quick image creation | PNG |
gemini-3-pro-preview | Multimodal understanding | Image analysis without generation | N/A |
gemini-2.5-flash | Fast vision | Quick image analysis | N/A |
Configuration
API Key: ${GEMINI_API_KEY}
Usage
Image Generation
python -c "
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
response = client.models.generate_content(
model='gemini-3-pro-image-preview', # Returns JPEG | Use gemini-2.5-flash-image for PNG
contents='Generate an image of a sunset over mountains',
config=types.GenerateContentConfig(
response_modalities=['IMAGE', 'TEXT']
)
)
# Map mime types to file extensions
mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'}
# Save generated image
if response.candidates and response.candidates[0].content:
for part in response.candidates[0].content.parts:
if hasattr(part, 'inline_data') and part.inline_data:
ext = mime_to_ext.get(part.inline_data.mime_type, '.png')
filename = f'output{ext}'
# Data is already raw bytes - no base64 decode needed
with open(filename, 'wb') as f:
f.write(part.inline_data.data)
print(f'Image saved to {filename} ({part.inline_data.mime_type})')
elif hasattr(part, 'text'):
print(part.text)
"
Image Understanding (Analyze Image from File)
python -c "
from google import genai
from google.genai import types
import base64
client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
# Read image file - must be base64 encoded for INPUT
with open('IMAGE_PATH', 'rb') as f:
image_data = base64.b64encode(f.read()).decode('utf-8')
response = client.models.generate_content(
model='gemini-3-pro-preview',
contents=[
types.Content(parts=[
types.Part(text='Describe this image in detail'),
types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data))
])
]
)
print(response.text)
"
Image Understanding (From URL)
python -c "
from google import genai
from google.genai import types
import urllib.request
import base64
client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
# Fetch image from URL - must be base64 encoded for INPUT
url = 'IMAGE_URL_HERE'
with urllib.request.urlopen(url) as response:
image_data = base64.b64encode(response.read()).decode('utf-8')
response = client.models.generate_content(
model='gemini-3-pro-preview',
contents=[
types.Content(parts=[
types.Part(text='What is in this image?'),
types.Part(inline_data=types.Blob(mime_type='image/jpeg', data=image_data))
])
]
)
print(response.text)
"
Workflow
When this skill is invoked:
-
Determine the task type:
- Image Generation: User wants to create an image
- Image Understanding: User wants to analyze an existing image
- Image Editing: User wants to modify an image (generation with reference)
-
Select the appropriate model:
- Image generation →
gemini-3-pro-image-preview(JPEG) orgemini-2.5-flash-image(PNG) - Image analysis →
gemini-3-pro-previeworgemini-2.5-flash
- Image generation →
-
Prepare the input:
- For generation: Text prompt describing desired image
- For understanding: Load image file as base64
-
Execute and handle output:
- Generation: Save binary image data to file
- Understanding: Return text description
Example Invocations
Generate Product Image
python -c "
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
response = client.models.generate_content(
model='gemini-3-pro-image-preview',
contents='Create a professional product photo of a sleek wireless headphone on a white background, studio lighting',
config=types.GenerateContentConfig(
response_modalities=['IMAGE', 'TEXT']
)
)
mime_to_ext = {'image/png': '.png', 'image/jpeg': '.jpg', 'image/gif': '.gif', 'image/webp': '.webp'}
if response.candidates and response.candidates[0].content:
for part in response.candidates[0].content.parts:
if hasattr(part, 'inline_data') and part.inline_data:
ext = mime_to_ext.get(part.inline_data.mime_type, '.png')
with open(f'headphone{ext}', 'wb') as f:
f.write(part.inline_data.data)
print(f'Image saved to headphone{ext}')
"
Analyze Screenshot
python -c "
from google import genai
from google.genai import types
import base64
client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
with open('screenshot.png', 'rb') as f:
image_data = base64.b64encode(f.read()).decode('utf-8')
response = client.models.generate_content(
model='gemini-3-pro-preview',
contents=[
types.Content(parts=[
types.Part(text='Analyze this UI screenshot. Identify any usability issues and suggest improvements.'),
types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data))
])
]
)
print(response.text)
"
OCR / Extract Text from Image
python -c "
from google import genai
from google.genai import types
import base64
client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
with open('document.png', 'rb') as f:
image_data = base64.b64encode(f.read()).decode('utf-8')
response = client.models.generate_content(
model='gemini-3-pro-preview',
contents=[
types.Content(parts=[
types.Part(text='Extract all text from this image. Preserve formatting where possible.'),
types.Part(inline_data=types.Blob(mime_type='image/png', data=image_data))
])
]
)
print(response.text)
"
Compare Two Images
python -c "
from google import genai
from google.genai import types
import base64
client = genai.Client(api_key=os.environ.get("GEMINI_API_KEY"))
with open('image1.png', 'rb') as f:
img1_data = base64.b64encode(f.read()).decode('utf-8')
with open('image2.png', 'rb') as f:
img2_data = base64.b64encode(f.read()).decode('utf-8')
response = client.models.generate_content(
model='gemini-3-pro-preview',
contents=[
types.Content(parts=[
types.Part(text='Compare these two images. What are the key differences?'),
types.Part(inline_data=types.Blob(mime_type='image/png', data=img1_data)),
types.Part(inline_data=types.Blob(mime_type='image/png', data=img2_data))
])
]
)
print(response.text)
"
Image Generation Parameters
When generating images, you can customize:
config=types.GenerateContentConfig(
response_modalities=['IMAGE', 'TEXT'], # Request both image and description
temperature=1.0, # Higher = more creative
# Additional parameters may be model-specific
)
Supported Image Formats
Input (for understanding):
- PNG (
image/png) - JPEG (
image/jpeg) - GIF (
image/gif) - WebP (
image/webp)
Output (from generation):
- PNG (default,
image/png) - The API returns raw bytes in
part.inline_data.data(NOT base64 encoded) - Check
part.inline_data.mime_typeto determine the actual format returned
Error Handling
Common errors and solutions:
- Image too large: Resize image before sending (max varies by model)
- Unsupported format: Convert to PNG/JPEG
- Generation blocked: Adjust prompt to comply with safety guidelines
- Rate limiting: Implement retry with exponential backoff
Notes
- Image generation requires
response_modalities=['IMAGE', 'TEXT']in config - For best results with generation, be specific and descriptive in prompts
- Image understanding works with both local files and URLs
- Multiple images can be sent in a single request for comparison
- Gemini 3 Pro Image is NOT available via CLI - must use Python SDK
Tools to Use
- Bash: Execute Python commands
- Read: Load image files (binary mode)
- Write: Save generated images
- Glob: Find image files in directories