Agents Reference

Core interface for interacting with LLMs in Pydantic AI.

Agent Components

Component	Description
Instructions	Developer-written prompts for LLM
Description	Human-readable label for instrumentation spans
Function Tools	Functions LLM can call during response
Output Type	Structured datatype LLM must return
Dependencies	Context passed to tools and prompts
Model	Default LLM (can override at runtime)
Model Settings	Temperature, max_tokens, timeout, etc.
Capabilities	Composable units bundling tools + hooks + instructions

Capabilities (v1.71.0+)

Composable, reusable units of agent behavior that bundle tools, lifecycle hooks, instructions, and model settings into a single class:

from pydantic_ai import Agent
from pydantic_ai.capabilities import WebSearch, Thinking, MCP, Hooks

# Provider-adaptive tools — auto-fallback from builtin to local
agent = Agent('openai:gpt-4o', capabilities=[
    WebSearch(),
    Thinking(),
    MCP(url='http://localhost:3000'),
])

Built-in capabilities: WebSearch, WebFetch, MCP, ImageGeneration, Thinking, Hooks.

Capability ordering (v1.80.0+)

When multiple capabilities wrap the same agent flow, ordering is now part of the public design surface.

CapabilityOrdering supports explicit placement such as innermost, outermost, wraps, wrapped_by, and requires.
Hooks also gained ordering controls and instance references so wrapper relationships can be expressed directly.
Use explicit ordering when capability composition changes semantics, for example when you need one capability to observe or transform requests before another wrapper runs.

Hooks Capability

Define hooks using decorators:

from pydantic_ai.capabilities import Hooks

hooks = Hooks()

@hooks.on_model_request
async def log_request(ctx):
    print(f"Sending request to {ctx.model}")

agent = Agent('openai:gpt-4o', capabilities=[hooks])

Hooks can raise ModelRetry for retry control flow. before_model_request / wrap hooks can swap models via ModelRequestContext.

Server-side compaction capabilities (v1.80.0+)

Pydantic AI now exposes provider-backed compaction capabilities for long-running conversations:

OpenAICompaction
AnthropicCompaction

OpenAI compaction also gained a stateful mode in the 1.84.x line. Use these capabilities when you want the provider to manage context reduction instead of layering your own summarization logic on every turn.

AgentSpec (v1.71.0+)

Load agents from YAML/JSON files:

from pydantic_ai import Agent

agent = Agent.from_file('agent.yaml')

Supports TemplateStr for templated instructions referencing deps.

Multimodal Input

Support for image, audio, video, and document input.

Image Input

from pydantic_ai import Agent, ImageUrl, BinaryContent

agent = Agent('openai:gpt-4o')

# URL
result = agent.run_sync([
    'What is this?',
    ImageUrl(url='https://example.com/image.png'),
])

# Local file
result = agent.run_sync([
    'Describe this image',
    BinaryContent(data=Path('photo.png').read_bytes(), media_type='image/png'),
])

Audio/Video/Document Input

from pydantic_ai import AudioUrl, VideoUrl, DocumentUrl

# Audio
agent.run_sync(['Transcribe this', AudioUrl(url='https://...')])

# Video
agent.run_sync(['Describe', VideoUrl(url='https://...')])

# Document (PDF)
agent.run_sync(['Summarize', DocumentUrl(url='https://...pdf')])

Force Download

If provider can't fetch URL directly:

ImageUrl(url='https://...', force_download=True)

Provider Support

Model	URL Direct	Download Required
OpenAI	ImageUrl	AudioUrl, DocumentUrl
Anthropic	ImageUrl, DocumentUrl(PDF)	DocumentUrl(text)
Google Vertex	All URLs	—
Mistral	ImageUrl, DocumentUrl(PDF)	—

Creating Agents

from pydantic_ai import Agent, RunContext

agent = Agent(
    'openai:gpt-4o',           # model identifier
    deps_type=int,              # dependency type
    output_type=bool,           # structured output type
    description='Triage GitHub issues and draft concise replies',
    system_prompt='Your instructions here',
    model_settings=ModelSettings(temperature=0.5),
    retries=2,                  # default retry count
)

Agent Description (v1.69.0)

Use description= when you want traces and observability spans to carry a stable, human-readable agent label.

from pydantic_ai import Agent

agent = Agent(
    'openai:gpt-4o',
    description='Customer-support classifier',
)

When instrumentation is enabled, Pydantic AI attaches this value to the run span as gen_ai.agent.description.

Dependencies

Dependency injection system for passing data/services to prompts, tools, validators.

Defining Dependencies

from dataclasses import dataclass
import httpx

@dataclass
class MyDeps:
    api_key: str
    http_client: httpx.AsyncClient

agent = Agent(
    'openai:gpt-4o',
    deps_type=MyDeps,  # pass TYPE, not instance
)

Accessing via RunContext

@agent.system_prompt
async def get_prompt(ctx: RunContext[MyDeps]) -> str:
    response = await ctx.deps.http_client.get(
        'https://api.example.com',
        headers={'Authorization': f'Bearer {ctx.deps.api_key}'}
    )
    return f"Context: {response.text}"

@agent.tool
async def fetch_data(ctx: RunContext[MyDeps], query: str) -> str:
    # ctx.deps available in tools
    return await ctx.deps.http_client.get(f'/search?q={query}')

@agent.output_validator
async def validate(ctx: RunContext[MyDeps], output: str) -> str:
    # ctx.deps available in validators
    return output

Passing Dependencies at Runtime

async with httpx.AsyncClient() as client:
    deps = MyDeps(api_key='secret', http_client=client)
    result = await agent.run('Query', deps=deps)

Async vs Sync Dependencies

Both work. Non-async functions run in thread pool via run_in_executor.

# Async (preferred for IO)
@agent.tool
async def async_tool(ctx: RunContext[MyDeps]) -> str:
    return await ctx.deps.http_client.get('/data')

# Sync (also works)
@agent.tool
def sync_tool(ctx: RunContext[MyDeps]) -> str:
    return ctx.deps.sync_client.get('/data')

Overriding Dependencies (Testing)

class TestDeps(MyDeps):
    async def system_prompt_factory(self) -> str:
        return "test prompt"

async def test_app():
    test_deps = TestDeps('test_key', None)
    with agent.override(deps=test_deps):
        result = await application_code('Query')

Run Methods

Method	Description
`run()`	Async, returns `RunResult`
`run_sync()`	Synchronous wrapper
`run_stream()`	Async context manager, streams response
`run_stream_sync()`	Sync streaming
`run_stream_events()`	Async iterable of all events
`iter()`	Iterate over graph nodes

Basic Run

# Synchronous
result = agent.run_sync('What is 2+2?', deps=my_deps)
print(result.output)

# Async
result = await agent.run('What is 2+2?')
print(result.output)

Streaming

async with agent.run_stream('Tell me a story') as response:
    async for text in response.stream_text():
        print(text, end='')

Stream Events

from pydantic_ai import (
    AgentStreamEvent,
    FunctionToolCallEvent,
    FunctionToolResultEvent,
    PartDeltaEvent,
    TextPartDelta,
)

async for event in agent.run_stream_events('Query'):
    if isinstance(event, PartDeltaEvent):
        if isinstance(event.delta, TextPartDelta):
            print(event.delta.content_delta)
    elif isinstance(event, FunctionToolCallEvent):
        print(f'Tool: {event.part.tool_name}')

Iterate Over Graph

from pydantic_graph import End

async with agent.iter('Query') as agent_run:
    async for node in agent_run:
        print(node)
print(agent_run.result.output)

System Prompts vs Instructions

Feature	system_prompt	instructions
Message history	Preserved across runs	Only current agent's
Use case	Multi-agent handoffs	Fresh context each run

Static System Prompt

agent = Agent(
    'openai:gpt-4o',
    system_prompt="You are a helpful assistant."
)

Dynamic System Prompt

@agent.system_prompt
def add_context(ctx: RunContext[Deps]) -> str:
    return f"User: {ctx.deps.user_name}"

Instructions

agent = Agent(
    'openai:gpt-4o',
    instructions="Be concise."
)

@agent.instructions
def add_date() -> str:
    return f"Date: {date.today()}"

# Runtime instructions
result = agent.run_sync('Query', instructions="Extra context")

Usage Limits

from pydantic_ai import UsageLimits, UsageLimitExceeded

try:
    result = agent.run_sync(
        'Query',
        usage_limits=UsageLimits(
            response_tokens_limit=100,  # max response tokens
            request_limit=5,            # max model turns
            tool_calls_limit=10,        # max tool executions
        )
    )
except UsageLimitExceeded as e:
    print(f"Limit exceeded: {e}")

Model Settings

Settings merge: model defaults → agent defaults → run overrides

from pydantic_ai import ModelSettings

# Agent-level
agent = Agent(
    'openai:gpt-4o',
    model_settings=ModelSettings(temperature=0.5, max_tokens=500)
)

# Run-level override
result = agent.run_sync(
    'Query',
    model_settings=ModelSettings(temperature=0.0)
)

Run Metadata

from dataclasses import dataclass

@dataclass
class Deps:
    tenant: str

agent = Agent[Deps](
    'openai:gpt-4o',
    deps_type=Deps,
    metadata=lambda ctx: {'tenant': ctx.deps.tenant},
)

result = agent.run_sync(
    'Query',
    deps=Deps(tenant='acme'),
    metadata={'extra': 'data'},  # merged with agent metadata
)
print(result.metadata)  # {'tenant': 'acme', 'extra': 'data'}

Run context now exposes output validation retry count for observability (v1.52.0).

Reflection and Self-Correction

from pydantic_ai import ModelRetry

@agent.tool(retries=3)
def lookup_user(ctx: RunContext[Deps], name: str) -> int:
    user = ctx.deps.db.find(name)
    if not user:
        raise ModelRetry(f"User {name} not found. Try full name.")
    return user.id

Error Handling

from pydantic_ai import UnexpectedModelBehavior, capture_run_messages

with capture_run_messages() as messages:
    try:
        result = agent.run_sync('Query')
    except UnexpectedModelBehavior as e:
        print(f"Error: {e}")
        print(f"Messages: {messages}")

Agent Constructor Parameters

Parameter	Type	Description
`model`	str or Model	Model identifier or instance
`deps_type`	type	Dependency type for RunContext
`output_type`	type	Pydantic model for output
`system_prompt`	str	Static system prompt
`instructions`	str	Instructions (not in history)
`model_settings`	ModelSettings	Default model settings
`retries`	int	Default retry count
`metadata`	dict or callable	Run metadata
`end_strategy`	str	'early' or 'exhaustive'
`history_processors`	list	Message history processors

Messages and Chat History

Accessing Messages

result = agent.run_sync('Tell me a joke')

# All messages including prior runs
all_msgs = result.all_messages()

# Only messages from current run
new_msgs = result.new_messages()

# JSON serialization
json_bytes = result.all_messages_json()

Continuing Conversations

result1 = agent.run_sync('Tell me a joke')
print(result1.output)

# Continue with message history
result2 = agent.run_sync(
    'Explain?',
    message_history=result1.new_messages()
)
print(result2.output)

Serialize/Deserialize Messages

from pydantic_core import to_jsonable_python
from pydantic_ai import ModelMessagesTypeAdapter

# Serialize
history = result.all_messages()
as_python = to_jsonable_python(history)

# Deserialize
restored = ModelMessagesTypeAdapter.validate_python(as_python)

# Use restored history
result = agent.run_sync('Continue', message_history=restored)

History Processors

Intercept and modify message history before each request:

from pydantic_ai import Agent, ModelMessage, ModelRequest

def keep_recent(messages: list[ModelMessage]) -> list[ModelMessage]:
    """Keep only last 5 messages."""
    return messages[-5:] if len(messages) > 5 else messages

def filter_responses(messages: list[ModelMessage]) -> list[ModelMessage]:
    """Remove ModelResponse, keep only requests."""
    return [m for m in messages if isinstance(m, ModelRequest)]

agent = Agent(
    'openai:gpt-4o',
    history_processors=[filter_responses, keep_recent],
)

Context-Aware Processor

def token_aware(ctx: RunContext[None], messages: list[ModelMessage]) -> list[ModelMessage]:
    if ctx.usage.total_tokens > 1000:
        return messages[-3:]  # Keep recent when high token usage
    return messages

Summarize Old Messages

summarizer = Agent('openai:gpt-4o-mini', instructions='Summarize conversation.')

async def summarize_old(messages: list[ModelMessage]) -> list[ModelMessage]:
    if len(messages) > 10:
        oldest = messages[:10]
        summary = await summarizer.run(message_history=oldest)
        return summary.new_messages() + messages[-1:]
    return messages

Warning: When slicing history, ensure tool calls and returns are paired.

Direct Model Requests

Low-level API for making requests without full Agent functionality.

When to Use

Need direct control over model interactions
Building custom abstractions
Don't need tool execution, retrying, structured output

Basic Usage

from pydantic_ai import ModelRequest
from pydantic_ai.direct import model_request_sync

response = model_request_sync(
    'anthropic:claude-haiku-4-5',
    [ModelRequest.user_text_prompt('What is the capital of France?')]
)

print(response.parts[0].content)  # Paris
print(response.usage)  # RequestUsage(input_tokens=56, output_tokens=7)

Async Request

from pydantic_ai.direct import model_request

response = await model_request(
    'openai:gpt-4o',
    [ModelRequest.user_text_prompt('Hello')]
)

With Tool Definitions

from pydantic import BaseModel
from pydantic_ai import ModelRequest, ToolDefinition
from pydantic_ai.direct import model_request
from pydantic_ai.models import ModelRequestParameters

class Divide(BaseModel):
    """Divide two numbers."""
    numerator: float
    denominator: float

response = await model_request(
    'openai:gpt-4o',
    [ModelRequest.user_text_prompt('What is 123 / 456?')],
    model_request_parameters=ModelRequestParameters(
        function_tools=[
            ToolDefinition(
                name='divide',
                description=Divide.__doc__,
                parameters_json_schema=Divide.model_json_schema(),
            )
        ],
        allow_text_output=True,
    ),
)

Available Functions

Function	Description
`model_request`	Async non-streamed
`model_request_sync`	Sync non-streamed
`model_request_stream`	Async streamed
`model_request_stream_sync`	Sync streamed

Multi-Agent Patterns

Five levels of complexity:

Single agent — Basic agent workflows
Agent delegation — Agent calls another via tools
Programmatic hand-off — App code orchestrates agents
Graph-based control — State machine controls agents
Deep agents — Autonomous with planning, files, code exec

Agent Delegation

Parent agent delegates to child agent via tool:

from pydantic_ai import Agent, RunContext

parent_agent = Agent('openai:gpt-4o', system_prompt='Use joke_factory to get jokes.')
child_agent = Agent('anthropic:claude-sonnet-4-5', output_type=list[str])

@parent_agent.tool
async def joke_factory(ctx: RunContext[None], count: int) -> list[str]:
    result = await child_agent.run(
        f'Generate {count} jokes',
        usage=ctx.usage,  # Share usage tracking
    )
    return result.output

Key points:

Pass usage=ctx.usage to track combined usage
Pass deps=ctx.deps if child needs same dependencies
Different models allowed (cost calculation manual)

Programmatic Hand-off

Sequential agents with app logic between:

from pydantic_ai import Agent, ModelMessage

flight_agent = Agent('openai:gpt-4o', output_type=FlightDetails | Failed)
seat_agent = Agent('openai:gpt-4o', output_type=SeatPreference | Failed)

async def main():
    # First agent
    flight_result = await flight_agent.run('Find flight to Paris')

    if isinstance(flight_result.output, FlightDetails):
        # Second agent (independent)
        seat_result = await seat_agent.run('Window seat please')

Agent with Shared Dependencies

@dataclass
class SharedDeps:
    http_client: httpx.AsyncClient
    api_key: str

parent = Agent('openai:gpt-4o', deps_type=SharedDeps)
child = Agent('anthropic:claude-sonnet-4-5', deps_type=SharedDeps)

@parent.tool
async def delegate(ctx: RunContext[SharedDeps], task: str) -> str:
    result = await child.run(
        task,
        deps=ctx.deps,   # Share dependencies
        usage=ctx.usage, # Share usage
    )
    return result.output

Deep Agent Capabilities

Capability	Implementation
Planning	Task management toolsets
File ops	FileSystemToolset
Delegation	Sub-agents via tools
Code exec	Sandboxed containers
Context mgmt	History processors
Approval	ApprovalRequiredToolset
Durability	Temporal, DBOS, Prefect

Thinking (Reasoning)

Enable step-by-step reasoning before final answer.

Provider Configuration

Provider	Setting	Example
OpenAI Responses	`openai_reasoning_effort`	`'low'`, `'medium'`, `'high'`
Anthropic	`anthropic_thinking`	`{'type': 'enabled', 'budget_tokens': 1024}`
Google	`google_thinking_config`	`{'include_thoughts': True}`
Groq	`groq_reasoning_format`	`'raw'`, `'hidden'`, `'parsed'`
OpenRouter	`openrouter_reasoning`	`{'effort': 'high'}`
Mistral	Auto (magistral models)	No config needed
Cohere	Auto (command-a-reasoning)	No config needed

OpenAI Responses Example

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIResponsesModel, OpenAIResponsesModelSettings

model = OpenAIResponsesModel('gpt-5')
settings = OpenAIResponsesModelSettings(
    openai_reasoning_effort='low',
    openai_reasoning_summary='detailed',
)
agent = Agent(model, model_settings=settings)

Anthropic Example

from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel, AnthropicModelSettings

model = AnthropicModel('claude-sonnet-4-0')
settings = AnthropicModelSettings(
    anthropic_thinking={'type': 'enabled', 'budget_tokens': 1024},
)
agent = Agent(model, model_settings=settings)

Google Example

from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings

model = GoogleModel('gemini-2.5-pro')
settings = GoogleModelSettings(google_thinking_config={'include_thoughts': True})
agent = Agent(model, model_settings=settings)

Bedrock Examples

from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings

# Anthropic on Bedrock
model = BedrockConverseModel('us.anthropic.claude-sonnet-4-5-20250929-v1:0')
settings = BedrockModelSettings(
    bedrock_additional_model_requests_fields={
        'thinking': {'type': 'enabled', 'budget_tokens': 1024}
    }
)

# OpenAI on Bedrock
model = BedrockConverseModel('openai.gpt-oss-120b-1:0')
settings = BedrockModelSettings(
    bedrock_additional_model_requests_fields={'reasoning_effort': 'low'}
)

# Deepseek on Bedrock (always enabled)
model = BedrockConverseModel('us.deepseek.r1-v1:0')
agent = Agent(model=model)  # No settings needed

Thinking Output

Thinking parts are returned as ThinkingPart objects in the message history:

OpenAI Chat: <think> tags converted to ThinkingPart
OpenAI Responses: Native thinking parts
Groq parsed: Structured thinking parts
Local models: <think> tags auto-converted

Troubleshooting

Jupyter Notebook: Event Loop Error

# Error: RuntimeError: This event loop is already running
# Fix: Install and apply nest-asyncio BEFORE any agent runs
import nest_asyncio
nest_asyncio.apply()

Note: Works in Google Colab and Marimo too.

API Key Missing

UserError: API key must be provided or set in the [MODEL]_API_KEY environment variable

Solutions:

Set environment variable: export OPENAI_API_KEY=sk-...
Pass directly: OpenAIModel('gpt-4o', api_key='sk-...')

Monitoring HTTPX Requests

Use custom httpx clients for request/response inspection:

import httpx
import logfire

# Install logfire httpx integration for monitoring
logfire.instrument_httpx()

client = httpx.AsyncClient()
model = OpenAIModel('gpt-4o', http_client=client)

Community Support

Slack: Join #pydantic-ai in Pydantic Slack
GitHub Issues: https://github.com/pydantic/pydantic-ai/issues
Logfire Pro: Private collaboration channel available

ナビゲーション

Skillsとは？

リンク

Agents Reference

Agents Reference

Agent Components

Capabilities (v1.71.0+)

Capability ordering (v1.80.0+)

Hooks Capability

Server-side compaction capabilities (v1.80.0+)

AgentSpec (v1.71.0+)

Multimodal Input

Image Input

Audio/Video/Document Input

Force Download

Provider Support

Creating Agents

Agent Description (v1.69.0)

Dependencies

Defining Dependencies

Accessing via RunContext

Passing Dependencies at Runtime

Async vs Sync Dependencies

Overriding Dependencies (Testing)

Run Methods

Basic Run

Streaming

Stream Events

Iterate Over Graph

System Prompts vs Instructions

Static System Prompt

Dynamic System Prompt

Instructions

Usage Limits

Model Settings

Run Metadata

Reflection and Self-Correction

Error Handling

Agent Constructor Parameters

Messages and Chat History

Accessing Messages

Continuing Conversations

Serialize/Deserialize Messages

History Processors

Context-Aware Processor

Summarize Old Messages

Direct Model Requests

When to Use

Basic Usage

Async Request

With Tool Definitions

Available Functions

Multi-Agent Patterns

Agent Delegation

Programmatic Hand-off

Agent with Shared Dependencies

Deep Agent Capabilities

Thinking (Reasoning)

Provider Configuration

OpenAI Responses Example

Anthropic Example

Google Example

Bedrock Examples

Thinking Output

Troubleshooting

Jupyter Notebook: Event Loop Error

API Key Missing

Monitoring HTTPX Requests

Community Support

関連スキル(🌐 Web開発)