name: gemini-video description: Invoke Google Gemini for video understanding and analysis using the Python google-genai SDK. Supports gemini-3-pro-preview and gemini-2.5-flash for video analysis, transcription, and content extraction.

Gemini Video Skill

Invoke Google Gemini models for video understanding, analysis, transcription, and content extraction using the Python google-genai SDK.

Available Models

Model ID	Description	Best For
`gemini-3-pro-preview`	Best multimodal understanding	Complex video analysis, detailed descriptions
`gemini-2.5-pro`	Advanced reasoning	Deep video analysis with reasoning
`gemini-2.5-flash`	Fast processing	Quick video summaries, high throughput

Configuration

API Key: Use environment variable GEMINI_API_KEY

Usage

Video Analysis (Local File)

For local video files, use the File API to upload first:

python -c "
from google import genai
from google.genai import types
import time

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

# Upload video file
video_file = client.files.upload(file='VIDEO_PATH')
print(f'Uploaded file: {video_file.name}')

# Wait for processing
while video_file.state.name == 'PROCESSING':
    print('Processing video...')
    time.sleep(5)
    video_file = client.files.get(name=video_file.name)

if video_file.state.name == 'FAILED':
    raise ValueError('Video processing failed')

# Analyze video
response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Describe what happens in this video'),
            types.Part(file_data=types.FileData(file_uri=video_file.uri, mime_type=video_file.mime_type))
        ])
    ]
)
print(response.text)
"

Video Analysis (From URL)

python -c "
from google import genai
from google.genai import types

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

# For publicly accessible video URLs
video_url = 'VIDEO_URL_HERE'

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Analyze this video and provide a detailed summary'),
            types.Part(file_data=types.FileData(file_uri=video_url, mime_type='video/mp4'))
        ])
    ]
)
print(response.text)
"

Video Transcription

python -c "
from google import genai
from google.genai import types
import time

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

# Upload video
video_file = client.files.upload(file='VIDEO_PATH')
while video_file.state.name == 'PROCESSING':
    time.sleep(5)
    video_file = client.files.get(name=video_file.name)

# Transcribe
response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='Transcribe all spoken words in this video. Include timestamps if possible.'),
            types.Part(file_data=types.FileData(file_uri=video_file.uri, mime_type=video_file.mime_type))
        ])
    ]
)
print(response.text)
"

Workflow

When this skill is invoked:

Determine the task type:
- Video Summary: Generate overview of video content
- Transcription: Extract spoken words
- Visual Analysis: Describe visual elements, scenes, actions
- Content Extraction: Pull specific information from video
- Q&A: Answer questions about video content
Prepare the video:
- Local file → Upload via File API
- Remote URL → Use directly (if publicly accessible)
- Wait for processing if needed
Select the appropriate model:
- Complex analysis → gemini-3-pro-preview
- Quick summaries → gemini-2.5-flash
Execute and return results

Example Invocations

Summarize Meeting Recording

python -c "
from google import genai
from google.genai import types
import time

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

video_file = client.files.upload(file='meeting_recording.mp4')
while video_file.state.name == 'PROCESSING':
    time.sleep(5)
    video_file = client.files.get(name=video_file.name)

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='''Summarize this meeting recording:
1. List all participants mentioned
2. Key discussion points
3. Action items and decisions made
4. Any deadlines mentioned'''),
            types.Part(file_data=types.FileData(file_uri=video_file.uri, mime_type=video_file.mime_type))
        ])
    ]
)
print(response.text)
"

Analyze Tutorial Video

python -c "
from google import genai
from google.genai import types
import time

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

video_file = client.files.upload(file='tutorial.mp4')
while video_file.state.name == 'PROCESSING':
    time.sleep(5)
    video_file = client.files.get(name=video_file.name)

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='''Analyze this tutorial video and create:
1. A step-by-step guide based on the content
2. Key concepts explained
3. Any tips or best practices mentioned
4. Prerequisites needed to follow along'''),
            types.Part(file_data=types.FileData(file_uri=video_file.uri, mime_type=video_file.mime_type))
        ])
    ]
)
print(response.text)
"

Extract Code from Coding Video

python -c "
from google import genai
from google.genai import types
import time

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

video_file = client.files.upload(file='coding_session.mp4')
while video_file.state.name == 'PROCESSING':
    time.sleep(5)
    video_file = client.files.get(name=video_file.name)

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='''Extract all code shown in this video:
1. Identify the programming language
2. Capture the complete code snippets
3. Note any explanations given for the code
4. List any libraries or dependencies mentioned'''),
            types.Part(file_data=types.FileData(file_uri=video_file.uri, mime_type=video_file.mime_type))
        ])
    ]
)
print(response.text)
"

Timestamp-Based Analysis

python -c "
from google import genai
from google.genai import types
import time

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

video_file = client.files.upload(file='presentation.mp4')
while video_file.state.name == 'PROCESSING':
    time.sleep(5)
    video_file = client.files.get(name=video_file.name)

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='''Create a timestamped outline of this video:
Format: [MM:SS] - Topic/Event
Include major topic changes, key points, and notable moments.'''),
            types.Part(file_data=types.FileData(file_uri=video_file.uri, mime_type=video_file.mime_type))
        ])
    ]
)
print(response.text)
"

Q&A About Video

python -c "
from google import genai
from google.genai import types
import time

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

video_file = client.files.upload(file='lecture.mp4')
while video_file.state.name == 'PROCESSING':
    time.sleep(5)
    video_file = client.files.get(name=video_file.name)

response = client.models.generate_content(
    model='gemini-3-pro-preview',
    contents=[
        types.Content(parts=[
            types.Part(text='YOUR_QUESTION_ABOUT_VIDEO_HERE'),
            types.Part(file_data=types.FileData(file_uri=video_file.uri, mime_type=video_file.mime_type))
        ])
    ]
)
print(response.text)
"

File Management

List Uploaded Files

python -c "
from google import genai

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))

for f in client.files.list():
    print(f'{f.name}: {f.state.name} ({f.mime_type})')
"

Delete Uploaded File

python -c "
from google import genai

client = genai.Client(api_key=os.environ.get('GEMINI_API_KEY'))
client.files.delete(name='files/FILE_ID_HERE')
print('File deleted')
"

Supported Video Formats

MP4 (video/mp4)
MOV (video/quicktime)
AVI (video/x-msvideo)
FLV (video/x-flv)
MKV (video/x-matroska)
WebM (video/webm)
WMV (video/x-ms-wmv)
3GPP (video/3gpp)

Video Limitations

Maximum file size: Check current API limits (typically 2GB)
Maximum duration: Varies by model (typically up to 1 hour)
Processing time: Longer videos take more time to process
Quota: Video analysis consumes more tokens than text

Error Handling

Common errors and solutions:

PROCESSING state stuck: Video may be too large or corrupted
FAILED state: Unsupported format or processing error
File not found: Upload before analysis
Rate limiting: Implement retry with exponential backoff

Notes

Videos must be uploaded via File API before analysis (no inline data like images)
Processing time depends on video length and complexity
Uploaded files are automatically deleted after 48 hours
For very long videos, consider chunking or asking specific timestamp questions
Gemini 3 Pro provides the most detailed video analysis

Tools to Use

Bash: Execute Python commands
Read: Load local video file paths
Write: Save transcriptions or analysis to files
Glob: Find video files in directories

ナビゲーション

Skillsとは？

リンク

gemini-video