AI SDK Agents - Comprehensive Notes
1. Directory Structure
The agents documentation contains 6 core markdown files:
overview.md- High-level agent concepts and ToolLoopAgent class introductionbuilding-agents.md- Creating agents, configuration, and usage patternsloop-control.md- Controlling execution flow, stopping conditions, prepareStepworkflows.md- Structured patterns for complex workflowsconfiguring-call-options.md- Runtime configuration and dynamic behaviorsubagents.md- Delegating to specialized subagents
2. Agent Architecture
Core Concept: Agents are LLMs that use tools in a loop to accomplish tasks.
Three Components:
- LLMs - Process input and decide next action
- Tools - Extend capabilities beyond text (files, APIs, databases)
- Loop - Orchestrates execution with context management and stopping conditions
ToolLoopAgent Class:
- Main abstraction for building agents
- Handles loop iteration, message history, and tool execution
- Default behavior: stops after 20 steps (
stepCountIs(20)) - Configurable with system instructions, tools, and output schemas
3. Agent Protocols & Execution Model
Loop Execution Flow:
Input -> Model Generation -> Check Result
|
Tool Call? -> Execute Tool -> Add to Context -> Continue
|
Text Generated? -> Return Result -> Done
|
Stop Condition Met? -> Done
Each Step Represents:
- One model generation (results in either text or tool call)
- Either a tool call to execute, or text response to return
Loop Continues Until:
- Model generates text (finish reasoning other than tool-calls)
- Tool invoked has no execute function
- Tool call needs approval
- Stop condition is met
4. Tool Execution Model
Tool Definition (using tool() helper):
tool({
description: '...', // What the tool does
inputSchema: z.object({...}), // Zod schema for inputs
execute: async ({...}) => {}, // Function that runs
})
Tool Execution Mechanics:
- Model decides which tool to call and what inputs to provide
- SDK executes the tool automatically
- Result is added to message history as "tool" role message
- Model sees result and decides next action
Tool Calling Control (toolChoice):
'auto'(default) - Model decides to use tools or generate text'required'- Force model to always call a tool'none'- Disable tools entirely{ type: 'tool', toolName: 'specific' }- Force specific tool
Preliminary Tool Results (for streaming):
executecan be async generator function- Each
yieldsends partial result to UI - Allows showing progress while tool executes
- Used for subagent streaming
5. Multi-Step/Agentic Loops
Default Loop:
- Maximum 20 steps by default
- Each step: generate -> evaluate -> execute tool or stop
- Full context window available for each generation
Loop Control Mechanisms:
A) Stop Conditions (stopWhen):
stepCountIs(N)- Stop after N stepshasToolCall('toolName')- Stop after specific tool called- Custom conditions - Define logic based on steps array
- Can combine multiple conditions (stops if ANY met)
B) Dynamic Control (prepareStep):
- Runs before each step
- Can modify:
- Model selection (switch based on complexity)
- Available tools (
activeTools) - Tool choice enforcement
- Messages (e.g., truncate for context limits)
- Any agent setting
- Receives: stepNumber, steps array, messages, model config
- Enables phased workflows (search -> analyze -> summarize)
C) Message Management:
- System message + user message + alternating assistant/tool messages
prepareStepcan modify message history (e.g., keep recent only)- All previous steps visible to stop conditions
6. Memory & Context Management
Message History Structure:
[system, user, assistant (or tool calls), tool, assistant (or tool calls), ...]
Context Issues:
- Long conversations can exceed model context limits
prepareStepcan implement sliding window (keep only recent N messages)- Summarization possible: replace old messages with summaries
Subagent Context Isolation:
- Each subagent starts with fresh context window
- Critical benefit: heavy exploration doesn't bloat main agent
- Can pass main agent's message history to subagent if needed
State Persistence:
- All steps available in
stepsarray - Tool calls and results preserved
- Token usage tracked per step via
onStepFinishcallback
7. Human-in-the-Loop Support
Tool Approvals:
- Tools can require human approval before execution
- When
needsApprovalset, execution pauses waiting for confirmation - Stops the agent loop (counts as loop termination condition)
Callback Hooks:
onStepFinish: Called after each step completes- Receives: usage, finishReason, toolCalls
- Can be in constructor (agent-wide) or method (per-call)
- Constructor callback runs first, then method callback
User Interaction Points:
- Tool approvals for sensitive operations
- Custom UI via
createAgentUIStreamResponse - Type-safe message types via
InferAgentUIMessage
8. Streaming in Agents
Streaming Text:
const result = await agent.stream({ prompt: '...' })
for await (const chunk of result.textStream) {
console.log(chunk)
}
Streaming UI Messages:
createAgentUIStreamResponse()- Create API response for client apps- Works with Next.js routes, server actions
- Streams both text and tool calls with their results
Streaming Tool Results (Preliminary Results):
- Execute as async generator:
async function* ({ /* inputs */ }) - Each
yieldsends partial result readUIMessageStream()accumulates chunks into complete UIMessage- Frontend displays growing message as it arrives
Subagent Progress:
- Subagent invoked via tool's
executefunction - Tool can stream subagent's progress to UI
toModelOutputcontrols what main agent sees (full UI vs summary)
9. Error Handling in Agents
Loop Termination on Error:
- Tool execution failure stops the loop
- Incomplete tool calls handled via
convertToModelMessageswithignoreIncompleteToolCalls - Error can be logged and gracefully handled
Cancellation Support:
abortSignalparameter in tool execute- Pass through to async operations
- Triggers
AbortErroron cancellation - Subagent cancellation propagates from parent
Custom Error Recovery:
prepareStepcan implement retry logic- Modify model/tools on error (switch to stronger model)
- Early termination via stop conditions
10. Best Practices & Patterns
A) 5 Core Workflow Patterns:
-
Sequential Processing (Chains)
- Steps execute in order, each output -> next input
- Use for pipelines with clear sequence
- Example: generate copy -> evaluate quality -> improve if needed
-
Routing
- Model classifies input -> determines processing path
- First generation determines model/system prompt for next
- Example: route query by complexity to different models
-
Parallel Processing
- Independent tasks run simultaneously (Promise.all)
- Aggregate results afterward
- Example: parallel code review (security, performance, maintainability)
-
Orchestrator-Worker
- Primary agent plans -> specialized workers execute
- Each worker optimized for subtask type
- Example: architect plans feature -> specialized agents implement files
-
Evaluator-Optimizer
- Dedicated evaluation step assesses results
- Based on evaluation: proceed, retry, or take corrective action
- Iterative improvement loop
- Example: translate -> evaluate -> improve if quality < threshold
B) Agent Design Principles:
- Start with simplest approach that works
- Add complexity only when needed
- Balance flexibility (LLM freedom) vs control (constraints)
- Consider error tolerance and cost implications
C) System Instructions:
- Define agent role and expertise
- Specify behavioral guidelines and rules
- Explain tool usage patterns
- Guide response format/style
- Set boundaries and constraints
D) Tool Design:
- Clear descriptions for model understanding
- Structured Zod schemas for type safety
- Explicit execute functions
- No-execute tools for termination signals (with
toolChoice: 'required')
E) Cost Optimization:
- Monitor token usage via
onStepFinish - Use budget-based stop conditions
- Switch to smaller models for simple tasks
- Summarize tool results in
prepareStep
F) Context Management:
- Track message count in
prepareStep - Implement sliding window for long conversations
- Use subagents for context-heavy work
- Extract summaries instead of full exploration
G) Structured Workflows:
- Use core functions (
generateText,generateObject) for predictable flows - Combine with agents where flexibility needed
- Agents excel at exploration/multi-step reasoning
- Core functions excel at deterministic workflows
H) Subagent Patterns:
- Use only when benefits exceed latency cost
- Context isolation is primary value proposition
- Streaming progress with
readUIMessageStream toModelOutputkeeps main agent focused (full UI != model input)- Subagent instructions must explicitly produce summaries
I) Call Options for Dynamic Behavior:
- Define
callOptionsSchema(Zod) - Implement
prepareCallto modify settings - Enables RAG (fetch docs -> inject into instructions)
- Dynamic model/tool selection per request
- Provider-specific options (e.g., reasoning effort)
- Async
prepareCallfor data fetching
J) End-to-End Type Safety:
InferAgentUIMessage<typeof agent>for UI types- Tool parts have states: input-streaming, input-available, output-available, output-error
- Detect streaming vs complete:
part.preliminaryflag - Subagent output accessible in tool part
K) When NOT to Use Agents:
- Simple, one-shot tasks
- Workflows with fixed steps
- Deterministic processes with known paths
- Tasks requiring guaranteed specific behavior
- Use core functions (
generateText) instead
L) Manual Loop Control:
- Direct use of
generateTextwith custom loop - Complete control over messages, stopping, tools
- When
stopWhen/prepareStepinsufficient - Lower-level but more flexible
11. Key Configuration Options
ToolLoopAgent Constructor:
model- LLM to useinstructions- System prompttools- Object of available toolsstopWhen- Stop condition(s)prepareStep- Callback before each steptoolChoice- How model uses toolsoutput- Structured output schemaonStepFinish- Per-step callbackcallOptionsSchema- Runtime configuration schemaprepareCall- Transform call options to settings
Generation Methods:
generate(options)- Single generationstream(options)- Streaming response- Both return:
text,stepsarray,staticToolCalls,usage
12. Integration Points
With UI (React):
createAgentUIStreamResponse()in API routeuseChat<AgentMessageType>()hook- Full streaming support
- Type-safe messages
With Next.js:
- App Router: server actions with agent delegation
- API routes:
POST /api/chatwithcreateAgentUIStreamResponse - Server components for data fetching into
callOptions
With External APIs:
- Tools wrap API calls
- Errors caught and handled
- Context can be fetched and injected via
prepareCall