Agents Reference
Core interface for interacting with LLMs in Pydantic AI.
Agent Components
| Component | Description |
|---|---|
| Instructions | Developer-written prompts for LLM |
| Description | Human-readable label for instrumentation spans |
| Function Tools | Functions LLM can call during response |
| Output Type | Structured datatype LLM must return |
| Dependencies | Context passed to tools and prompts |
| Model | Default LLM (can override at runtime) |
| Model Settings | Temperature, max_tokens, timeout, etc. |
| Capabilities | Composable units bundling tools + hooks + instructions |
Capabilities (v1.71.0+)
Composable, reusable units of agent behavior that bundle tools, lifecycle hooks, instructions, and model settings into a single class:
from pydantic_ai import Agent
from pydantic_ai.capabilities import WebSearch, Thinking, MCP, Hooks
# Provider-adaptive tools — auto-fallback from builtin to local
agent = Agent('openai:gpt-4o', capabilities=[
WebSearch(),
Thinking(),
MCP(url='http://localhost:3000'),
])
Built-in capabilities: WebSearch, WebFetch, MCP, ImageGeneration, Thinking, Hooks.
Capability ordering (v1.80.0+)
When multiple capabilities wrap the same agent flow, ordering is now part of the public design surface.
CapabilityOrderingsupports explicit placement such asinnermost,outermost,wraps,wrapped_by, andrequires.Hooksalso gained ordering controls and instance references so wrapper relationships can be expressed directly.- Use explicit ordering when capability composition changes semantics, for example when you need one capability to observe or transform requests before another wrapper runs.
Hooks Capability
Define hooks using decorators:
from pydantic_ai.capabilities import Hooks
hooks = Hooks()
@hooks.on_model_request
async def log_request(ctx):
print(f"Sending request to {ctx.model}")
agent = Agent('openai:gpt-4o', capabilities=[hooks])
Hooks can raise ModelRetry for retry control flow. before_model_request / wrap hooks can swap models via ModelRequestContext.
Server-side compaction capabilities (v1.80.0+)
Pydantic AI now exposes provider-backed compaction capabilities for long-running conversations:
OpenAICompactionAnthropicCompaction
OpenAI compaction also gained a stateful mode in the 1.84.x line. Use these capabilities when you want the provider to manage context reduction instead of layering your own summarization logic on every turn.
AgentSpec (v1.71.0+)
Load agents from YAML/JSON files:
from pydantic_ai import Agent
agent = Agent.from_file('agent.yaml')
Supports TemplateStr for templated instructions referencing deps.
Multimodal Input
Support for image, audio, video, and document input.
Image Input
from pydantic_ai import Agent, ImageUrl, BinaryContent
agent = Agent('openai:gpt-4o')
# URL
result = agent.run_sync([
'What is this?',
ImageUrl(url='https://example.com/image.png'),
])
# Local file
result = agent.run_sync([
'Describe this image',
BinaryContent(data=Path('photo.png').read_bytes(), media_type='image/png'),
])
Audio/Video/Document Input
from pydantic_ai import AudioUrl, VideoUrl, DocumentUrl
# Audio
agent.run_sync(['Transcribe this', AudioUrl(url='https://...')])
# Video
agent.run_sync(['Describe', VideoUrl(url='https://...')])
# Document (PDF)
agent.run_sync(['Summarize', DocumentUrl(url='https://...pdf')])
Force Download
If provider can't fetch URL directly:
ImageUrl(url='https://...', force_download=True)
Provider Support
| Model | URL Direct | Download Required |
|---|---|---|
| OpenAI | ImageUrl | AudioUrl, DocumentUrl |
| Anthropic | ImageUrl, DocumentUrl(PDF) | DocumentUrl(text) |
| Google Vertex | All URLs | — |
| Mistral | ImageUrl, DocumentUrl(PDF) | — |
Creating Agents
from pydantic_ai import Agent, RunContext
agent = Agent(
'openai:gpt-4o', # model identifier
deps_type=int, # dependency type
output_type=bool, # structured output type
description='Triage GitHub issues and draft concise replies',
system_prompt='Your instructions here',
model_settings=ModelSettings(temperature=0.5),
retries=2, # default retry count
)
Agent Description (v1.69.0)
Use description= when you want traces and observability spans to carry a stable, human-readable agent label.
from pydantic_ai import Agent
agent = Agent(
'openai:gpt-4o',
description='Customer-support classifier',
)
When instrumentation is enabled, Pydantic AI attaches this value to the run span as gen_ai.agent.description.
Dependencies
Dependency injection system for passing data/services to prompts, tools, validators.
Defining Dependencies
from dataclasses import dataclass
import httpx
@dataclass
class MyDeps:
api_key: str
http_client: httpx.AsyncClient
agent = Agent(
'openai:gpt-4o',
deps_type=MyDeps, # pass TYPE, not instance
)
Accessing via RunContext
@agent.system_prompt
async def get_prompt(ctx: RunContext[MyDeps]) -> str:
response = await ctx.deps.http_client.get(
'https://api.example.com',
headers={'Authorization': f'Bearer {ctx.deps.api_key}'}
)
return f"Context: {response.text}"
@agent.tool
async def fetch_data(ctx: RunContext[MyDeps], query: str) -> str:
# ctx.deps available in tools
return await ctx.deps.http_client.get(f'/search?q={query}')
@agent.output_validator
async def validate(ctx: RunContext[MyDeps], output: str) -> str:
# ctx.deps available in validators
return output
Passing Dependencies at Runtime
async with httpx.AsyncClient() as client:
deps = MyDeps(api_key='secret', http_client=client)
result = await agent.run('Query', deps=deps)
Async vs Sync Dependencies
Both work. Non-async functions run in thread pool via run_in_executor.
# Async (preferred for IO)
@agent.tool
async def async_tool(ctx: RunContext[MyDeps]) -> str:
return await ctx.deps.http_client.get('/data')
# Sync (also works)
@agent.tool
def sync_tool(ctx: RunContext[MyDeps]) -> str:
return ctx.deps.sync_client.get('/data')
Overriding Dependencies (Testing)
class TestDeps(MyDeps):
async def system_prompt_factory(self) -> str:
return "test prompt"
async def test_app():
test_deps = TestDeps('test_key', None)
with agent.override(deps=test_deps):
result = await application_code('Query')
Run Methods
| Method | Description |
|---|---|
run() | Async, returns RunResult |
run_sync() | Synchronous wrapper |
run_stream() | Async context manager, streams response |
run_stream_sync() | Sync streaming |
run_stream_events() | Async iterable of all events |
iter() | Iterate over graph nodes |
Basic Run
# Synchronous
result = agent.run_sync('What is 2+2?', deps=my_deps)
print(result.output)
# Async
result = await agent.run('What is 2+2?')
print(result.output)
Streaming
async with agent.run_stream('Tell me a story') as response:
async for text in response.stream_text():
print(text, end='')
Stream Events
from pydantic_ai import (
AgentStreamEvent,
FunctionToolCallEvent,
FunctionToolResultEvent,
PartDeltaEvent,
TextPartDelta,
)
async for event in agent.run_stream_events('Query'):
if isinstance(event, PartDeltaEvent):
if isinstance(event.delta, TextPartDelta):
print(event.delta.content_delta)
elif isinstance(event, FunctionToolCallEvent):
print(f'Tool: {event.part.tool_name}')
Iterate Over Graph
from pydantic_graph import End
async with agent.iter('Query') as agent_run:
async for node in agent_run:
print(node)
print(agent_run.result.output)
System Prompts vs Instructions
| Feature | system_prompt | instructions |
|---|---|---|
| Message history | Preserved across runs | Only current agent's |
| Use case | Multi-agent handoffs | Fresh context each run |
Static System Prompt
agent = Agent(
'openai:gpt-4o',
system_prompt="You are a helpful assistant."
)
Dynamic System Prompt
@agent.system_prompt
def add_context(ctx: RunContext[Deps]) -> str:
return f"User: {ctx.deps.user_name}"
Instructions
agent = Agent(
'openai:gpt-4o',
instructions="Be concise."
)
@agent.instructions
def add_date() -> str:
return f"Date: {date.today()}"
# Runtime instructions
result = agent.run_sync('Query', instructions="Extra context")
Usage Limits
from pydantic_ai import UsageLimits, UsageLimitExceeded
try:
result = agent.run_sync(
'Query',
usage_limits=UsageLimits(
response_tokens_limit=100, # max response tokens
request_limit=5, # max model turns
tool_calls_limit=10, # max tool executions
)
)
except UsageLimitExceeded as e:
print(f"Limit exceeded: {e}")
Model Settings
Settings merge: model defaults → agent defaults → run overrides
from pydantic_ai import ModelSettings
# Agent-level
agent = Agent(
'openai:gpt-4o',
model_settings=ModelSettings(temperature=0.5, max_tokens=500)
)
# Run-level override
result = agent.run_sync(
'Query',
model_settings=ModelSettings(temperature=0.0)
)
Run Metadata
from dataclasses import dataclass
@dataclass
class Deps:
tenant: str
agent = Agent[Deps](
'openai:gpt-4o',
deps_type=Deps,
metadata=lambda ctx: {'tenant': ctx.deps.tenant},
)
result = agent.run_sync(
'Query',
deps=Deps(tenant='acme'),
metadata={'extra': 'data'}, # merged with agent metadata
)
print(result.metadata) # {'tenant': 'acme', 'extra': 'data'}
Run context now exposes output validation retry count for observability (v1.52.0).
Reflection and Self-Correction
from pydantic_ai import ModelRetry
@agent.tool(retries=3)
def lookup_user(ctx: RunContext[Deps], name: str) -> int:
user = ctx.deps.db.find(name)
if not user:
raise ModelRetry(f"User {name} not found. Try full name.")
return user.id
Error Handling
from pydantic_ai import UnexpectedModelBehavior, capture_run_messages
with capture_run_messages() as messages:
try:
result = agent.run_sync('Query')
except UnexpectedModelBehavior as e:
print(f"Error: {e}")
print(f"Messages: {messages}")
Agent Constructor Parameters
| Parameter | Type | Description |
|---|---|---|
model | str or Model | Model identifier or instance |
deps_type | type | Dependency type for RunContext |
output_type | type | Pydantic model for output |
system_prompt | str | Static system prompt |
instructions | str | Instructions (not in history) |
model_settings | ModelSettings | Default model settings |
retries | int | Default retry count |
metadata | dict or callable | Run metadata |
end_strategy | str | 'early' or 'exhaustive' |
history_processors | list | Message history processors |
Messages and Chat History
Accessing Messages
result = agent.run_sync('Tell me a joke')
# All messages including prior runs
all_msgs = result.all_messages()
# Only messages from current run
new_msgs = result.new_messages()
# JSON serialization
json_bytes = result.all_messages_json()
Continuing Conversations
result1 = agent.run_sync('Tell me a joke')
print(result1.output)
# Continue with message history
result2 = agent.run_sync(
'Explain?',
message_history=result1.new_messages()
)
print(result2.output)
Serialize/Deserialize Messages
from pydantic_core import to_jsonable_python
from pydantic_ai import ModelMessagesTypeAdapter
# Serialize
history = result.all_messages()
as_python = to_jsonable_python(history)
# Deserialize
restored = ModelMessagesTypeAdapter.validate_python(as_python)
# Use restored history
result = agent.run_sync('Continue', message_history=restored)
History Processors
Intercept and modify message history before each request:
from pydantic_ai import Agent, ModelMessage, ModelRequest
def keep_recent(messages: list[ModelMessage]) -> list[ModelMessage]:
"""Keep only last 5 messages."""
return messages[-5:] if len(messages) > 5 else messages
def filter_responses(messages: list[ModelMessage]) -> list[ModelMessage]:
"""Remove ModelResponse, keep only requests."""
return [m for m in messages if isinstance(m, ModelRequest)]
agent = Agent(
'openai:gpt-4o',
history_processors=[filter_responses, keep_recent],
)
Context-Aware Processor
def token_aware(ctx: RunContext[None], messages: list[ModelMessage]) -> list[ModelMessage]:
if ctx.usage.total_tokens > 1000:
return messages[-3:] # Keep recent when high token usage
return messages
Summarize Old Messages
summarizer = Agent('openai:gpt-4o-mini', instructions='Summarize conversation.')
async def summarize_old(messages: list[ModelMessage]) -> list[ModelMessage]:
if len(messages) > 10:
oldest = messages[:10]
summary = await summarizer.run(message_history=oldest)
return summary.new_messages() + messages[-1:]
return messages
Warning: When slicing history, ensure tool calls and returns are paired.
Direct Model Requests
Low-level API for making requests without full Agent functionality.
When to Use
- Need direct control over model interactions
- Building custom abstractions
- Don't need tool execution, retrying, structured output
Basic Usage
from pydantic_ai import ModelRequest
from pydantic_ai.direct import model_request_sync
response = model_request_sync(
'anthropic:claude-haiku-4-5',
[ModelRequest.user_text_prompt('What is the capital of France?')]
)
print(response.parts[0].content) # Paris
print(response.usage) # RequestUsage(input_tokens=56, output_tokens=7)
Async Request
from pydantic_ai.direct import model_request
response = await model_request(
'openai:gpt-4o',
[ModelRequest.user_text_prompt('Hello')]
)
With Tool Definitions
from pydantic import BaseModel
from pydantic_ai import ModelRequest, ToolDefinition
from pydantic_ai.direct import model_request
from pydantic_ai.models import ModelRequestParameters
class Divide(BaseModel):
"""Divide two numbers."""
numerator: float
denominator: float
response = await model_request(
'openai:gpt-4o',
[ModelRequest.user_text_prompt('What is 123 / 456?')],
model_request_parameters=ModelRequestParameters(
function_tools=[
ToolDefinition(
name='divide',
description=Divide.__doc__,
parameters_json_schema=Divide.model_json_schema(),
)
],
allow_text_output=True,
),
)
Available Functions
| Function | Description |
|---|---|
model_request | Async non-streamed |
model_request_sync | Sync non-streamed |
model_request_stream | Async streamed |
model_request_stream_sync | Sync streamed |
Multi-Agent Patterns
Five levels of complexity:
- Single agent — Basic agent workflows
- Agent delegation — Agent calls another via tools
- Programmatic hand-off — App code orchestrates agents
- Graph-based control — State machine controls agents
- Deep agents — Autonomous with planning, files, code exec
Agent Delegation
Parent agent delegates to child agent via tool:
from pydantic_ai import Agent, RunContext
parent_agent = Agent('openai:gpt-4o', system_prompt='Use joke_factory to get jokes.')
child_agent = Agent('anthropic:claude-sonnet-4-5', output_type=list[str])
@parent_agent.tool
async def joke_factory(ctx: RunContext[None], count: int) -> list[str]:
result = await child_agent.run(
f'Generate {count} jokes',
usage=ctx.usage, # Share usage tracking
)
return result.output
Key points:
- Pass
usage=ctx.usageto track combined usage - Pass
deps=ctx.depsif child needs same dependencies - Different models allowed (cost calculation manual)
Programmatic Hand-off
Sequential agents with app logic between:
from pydantic_ai import Agent, ModelMessage
flight_agent = Agent('openai:gpt-4o', output_type=FlightDetails | Failed)
seat_agent = Agent('openai:gpt-4o', output_type=SeatPreference | Failed)
async def main():
# First agent
flight_result = await flight_agent.run('Find flight to Paris')
if isinstance(flight_result.output, FlightDetails):
# Second agent (independent)
seat_result = await seat_agent.run('Window seat please')
Agent with Shared Dependencies
@dataclass
class SharedDeps:
http_client: httpx.AsyncClient
api_key: str
parent = Agent('openai:gpt-4o', deps_type=SharedDeps)
child = Agent('anthropic:claude-sonnet-4-5', deps_type=SharedDeps)
@parent.tool
async def delegate(ctx: RunContext[SharedDeps], task: str) -> str:
result = await child.run(
task,
deps=ctx.deps, # Share dependencies
usage=ctx.usage, # Share usage
)
return result.output
Deep Agent Capabilities
| Capability | Implementation |
|---|---|
| Planning | Task management toolsets |
| File ops | FileSystemToolset |
| Delegation | Sub-agents via tools |
| Code exec | Sandboxed containers |
| Context mgmt | History processors |
| Approval | ApprovalRequiredToolset |
| Durability | Temporal, DBOS, Prefect |
Thinking (Reasoning)
Enable step-by-step reasoning before final answer.
Provider Configuration
| Provider | Setting | Example |
|---|---|---|
| OpenAI Responses | openai_reasoning_effort | 'low', 'medium', 'high' |
| Anthropic | anthropic_thinking | {'type': 'enabled', 'budget_tokens': 1024} |
google_thinking_config | {'include_thoughts': True} | |
| Groq | groq_reasoning_format | 'raw', 'hidden', 'parsed' |
| OpenRouter | openrouter_reasoning | {'effort': 'high'} |
| Mistral | Auto (magistral models) | No config needed |
| Cohere | Auto (command-a-reasoning) | No config needed |
OpenAI Responses Example
from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIResponsesModel, OpenAIResponsesModelSettings
model = OpenAIResponsesModel('gpt-5')
settings = OpenAIResponsesModelSettings(
openai_reasoning_effort='low',
openai_reasoning_summary='detailed',
)
agent = Agent(model, model_settings=settings)
Anthropic Example
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel, AnthropicModelSettings
model = AnthropicModel('claude-sonnet-4-0')
settings = AnthropicModelSettings(
anthropic_thinking={'type': 'enabled', 'budget_tokens': 1024},
)
agent = Agent(model, model_settings=settings)
Google Example
from pydantic_ai import Agent
from pydantic_ai.models.google import GoogleModel, GoogleModelSettings
model = GoogleModel('gemini-2.5-pro')
settings = GoogleModelSettings(google_thinking_config={'include_thoughts': True})
agent = Agent(model, model_settings=settings)
Bedrock Examples
from pydantic_ai import Agent
from pydantic_ai.models.bedrock import BedrockConverseModel, BedrockModelSettings
# Anthropic on Bedrock
model = BedrockConverseModel('us.anthropic.claude-sonnet-4-5-20250929-v1:0')
settings = BedrockModelSettings(
bedrock_additional_model_requests_fields={
'thinking': {'type': 'enabled', 'budget_tokens': 1024}
}
)
# OpenAI on Bedrock
model = BedrockConverseModel('openai.gpt-oss-120b-1:0')
settings = BedrockModelSettings(
bedrock_additional_model_requests_fields={'reasoning_effort': 'low'}
)
# Deepseek on Bedrock (always enabled)
model = BedrockConverseModel('us.deepseek.r1-v1:0')
agent = Agent(model=model) # No settings needed
Thinking Output
Thinking parts are returned as ThinkingPart objects in the message history:
- OpenAI Chat:
<think>tags converted to ThinkingPart - OpenAI Responses: Native thinking parts
- Groq
parsed: Structured thinking parts - Local models:
<think>tags auto-converted
Troubleshooting
Jupyter Notebook: Event Loop Error
# Error: RuntimeError: This event loop is already running
# Fix: Install and apply nest-asyncio BEFORE any agent runs
import nest_asyncio
nest_asyncio.apply()
Note: Works in Google Colab and Marimo too.
API Key Missing
UserError: API key must be provided or set in the [MODEL]_API_KEY environment variable
Solutions:
- Set environment variable:
export OPENAI_API_KEY=sk-... - Pass directly:
OpenAIModel('gpt-4o', api_key='sk-...')
Monitoring HTTPX Requests
Use custom httpx clients for request/response inspection:
import httpx
import logfire
# Install logfire httpx integration for monitoring
logfire.instrument_httpx()
client = httpx.AsyncClient()
model = OpenAIModel('gpt-4o', http_client=client)
Community Support
- Slack: Join
#pydantic-aiin Pydantic Slack - GitHub Issues: https://github.com/pydantic/pydantic-ai/issues
- Logfire Pro: Private collaboration channel available