Tool Use and Agents
Sources: Huyen (AI Engineering, ch. 6 & 10), Lanham (AI Agents in Action), Arsanjani & Bustos (Agentic Architectural Patterns), 2025–2026 production patterns
Covers: tool design, function calling mechanics, parallel execution, error recovery, agent architecture spectrum, multi-agent orchestration patterns, planning strategies, failure modes.
Tool Design Principles
A tool is any capability the model can invoke: API calls, database queries, calculations, file reads, web searches. Well-designed tools are the difference between a useful agent and an unreliable one.
Design Rules
| Rule | Rationale |
|---|---|
| One tool does one thing | Compound tools obscure failure attribution; hard to retry partial steps |
| Use verb_noun naming | search_documents, get_user, send_email — unambiguous to the model |
| Type all parameters explicitly | JSON schema prevents the model guessing argument types |
| Write descriptions as instructions | "Returns the top 5 documents matching the query" — tell the model what to expect |
| Return structured data, not prose | The model processes JSON better than free-text tool results |
| Include error states in return type | {success: bool, data: ..., error: str | null} — never raise exceptions that halt the loop |
| Make tools idempotent when possible | Safe to retry on failure without side effects |
Tool Definition Pattern
Tool definition:
name: "search_documents"
description: "Search the knowledge base for documents relevant to a query.
Returns up to k document chunks with their source metadata."
parameters:
query: {type: string, description: "Search terms or natural language question"}
k: {type: integer, description: "Number of results to return (1–20)", default: 5}
filters: {type: object, description: "Optional metadata filters (e.g., {doc_type: 'policy'})"}
returns:
results: [{content: string, source: string, score: float}]
error: string | null
The description field is critical — models select tools based on descriptions, not names. Write descriptions as if explaining to a smart colleague who has never seen your codebase.
Function Calling Mechanics
Request → Execute → Continue Loop
1. Send messages + tool definitions to model
2. Model responds with one of:
a. tool_call(s): model wants to execute tools
b. text response: model has enough information to answer
3. If tool_call(s):
a. Execute each tool (see parallel execution below)
b. Append tool_result(s) to messages
c. Send messages back to model (go to step 2)
4. If text response: return to user
This loop continues until the model produces a text response or a max_steps limit is hit.
Message Thread Structure
messages = [
{role: "system", content: "You are a helpful assistant with access to tools."},
{role: "user", content: "What are our refund policies?"},
{role: "assistant", tool_calls: [{id: "call_1", name: "search_documents", input: {query: "refund policy"}}]},
{role: "tool", tool_call_id: "call_1", content: {results: [...], error: null}},
{role: "assistant", content: "Based on the documents, our refund policy is..."},
]
Always maintain the full message thread. The model uses tool results to generate the final response.
Parallel Tool Calls
When the model returns multiple tool_calls in a single response, execute them concurrently unless there is an explicit dependency between them.
When to Parallelize
| Scenario | Parallelize? |
|---|---|
| Independent lookups (get_user + get_order) | Yes |
| Sequential dependency (search → then filter results) | No |
| Same tool with different arguments | Yes |
| Tool B uses output of Tool A | No |
parallel execution pattern:
tool_calls = [call_1, call_2, call_3]
results = await Promise.all([
execute(call_1),
execute(call_2),
execute(call_3),
])
# All three finish in max(latency_1, latency_2, latency_3) instead of sum
3 independent tools at 300ms each: 900ms sequential → 300ms parallel. Always parallelize independent calls.
Tool Error Recovery
Errors are inevitable. Design the recovery strategy at the tool level, not in the LLM loop.
Recovery Decision Table
| Error Type | Return to Model | Retry | Escalate |
|---|---|---|---|
| Validation error (bad arguments) | Yes — with schema hint | After model corrects args | Never |
| Not found (empty result) | Yes — return null with context | No | If critical |
| Permission denied | Yes — explain limitation | No | To user or admin |
| Rate limit (429) | No — retry silently | Yes, with backoff | After 3 fails |
| Timeout | Yes — return partial or error | Once | After 2 fails |
| Service unavailable | No — retry silently | Yes | After 3 fails |
| Tool logic error (bug) | Yes — return error message | No | Alert on-call |
Max Steps Guard
Always set a maximum number of tool call iterations. Without it, a confused model can loop indefinitely.
max_steps = 10
steps = 0
while steps < max_steps:
response = model.generate(messages, tools=tools)
if response.type == "text":
return response.content
execute_tools(response.tool_calls)
messages.append(tool_results)
steps += 1
return fallback_response("Maximum steps reached. Could not complete the task.")
Agent Architecture Spectrum
Not every task needs a fully autonomous agent. Match architecture to task complexity.
Architecture Options
| Architecture | Control Flow | Predictability | Use When |
|---|---|---|---|
| Single LLM call | Fixed | Highest | Simple Q&A, classification |
| LLM + tools (1 loop) | Semi-structured | High | Lookup + generate |
| ReAct agent | LLM-directed | Medium | Open-ended, multi-step |
| Multi-agent | LLM-orchestrated | Lower | Complex, parallelizable |
Prefer workflows (predefined code paths) over agents for production. Workflows are auditable, debuggable, and predictable. Agents excel at tasks where the path is genuinely unknown at design time.
ReAct Loop (Reason + Act)
The fundamental single-agent pattern. The model alternates between reasoning about the current state and taking an action (tool call).
Thought: I need to find the user's order history to answer this question.
Action: get_order_history(user_id="u_123", limit=10)
Observation: [order_1, order_2, order_3]
Thought: The user's most recent order is order_1. Now I need the tracking status.
Action: get_shipment_status(order_id="order_1")
Observation: {status: "shipped", eta: "2026-03-02"}
Thought: I have all the information needed.
Response: Your most recent order ships on March 2nd.
Multi-Agent Orchestration Patterns
1. Orchestrator-Worker
Orchestrator (central planner)
├─ Worker A (retrieval specialist)
├─ Worker B (code executor)
└─ Worker C (summarizer)
Topology: Hub and spoke. Orchestrator receives the task, decomposes it, delegates to specialized workers, aggregates results.
Use when: Task has distinct phases requiring different expertise. Orchestrator enforces sequencing and handles failures.
Failure mode: Orchestrator becomes a bottleneck; single point of failure. Fix: make orchestrator stateless; retry at orchestration level.
2. Sequential Pipeline
Agent A → output → Agent B → output → Agent C → final result
Topology: Linear chain. Each agent receives the previous agent's output.
Use when: Task is a defined sequence of transformations (extract → classify → enrich → format).
Failure mode: Cascading errors. A bad output from Agent A corrupts all downstream agents. Fix: validate output schema at each stage before passing forward.
3. Fan-Out / Gather (Parallel)
Orchestrator
├─ Worker 1 (subtask 1) ─┐
├─ Worker 2 (subtask 2) ─┼─ Aggregator → final result
└─ Worker 3 (subtask 3) ─┘
Use when: Task decomposes into independent subtasks (research 5 competitors simultaneously, process 100 documents in parallel).
Failure mode: Partial failure — some workers succeed, some fail. Fix: define quorum (e.g., 3/5 required), use partial results if acceptable.
4. Generator-Critic (Reflection)
Generator agent → draft
Critic agent → critique → Generator agent → revised draft → ...
Use when: Output quality matters more than speed. Code review, document editing, plan validation.
Failure mode: Infinite refinement loop. Fix: hard limit on iterations (3–5 max); accept-or-escalate after limit.
5. Human-in-the-Loop
Agent operates autonomously
→ Reaches decision gate (irreversible action, high stakes)
→ Pauses, surfaces to human
→ Human approves/rejects/modifies
→ Agent continues
Use when: Actions are irreversible (send email, make purchase, delete records) or high-stakes (financial, legal, medical).
Implementation: Define approval gates explicitly. Log every human decision with context for audit.
Planning Strategies
How agents decompose complex tasks before acting.
| Strategy | Approach | When to Use |
|---|---|---|
| Zero-shot | Model selects tools directly based on tool descriptions | Simple, well-defined tasks |
| Chain-of-thought | Model reasons step-by-step before each tool call | Complex, multi-step tasks |
| Plan-then-execute | Generate full plan upfront, execute sequentially | Tasks with known structure |
| Adaptive planning | Revise plan based on intermediate tool results | Tasks with uncertain paths |
For high-stakes tasks, use plan-then-execute and validate the plan before execution. For exploratory tasks, use adaptive planning.
Agent Failure Modes
Common Failures and Fixes
| Failure Mode | Detection | Fix |
|---|---|---|
| Wrong tool selected | Tool results irrelevant to task | Improve tool descriptions; reduce tool count |
| Bad tool arguments | Tool returns validation error | Stricter parameter schemas; add argument examples |
| Hallucinated tool name | Tool_call references non-existent tool | Validate tool name before execution; return error to model |
| Context overflow | Generation quality drops in long sessions | Summarize conversation history at regular intervals |
| Infinite loop | Same tool called repeatedly with same args | Track call history; break if (tool, args) pair repeats |
| Unnecessary tool calls | Retrieval for questions the model already knows | Teach the model when NOT to retrieve (self-RAG prompt) |
| Cascading error | Early tool failure corrupts later steps | Validate and sanitize each tool result before appending |
Loop Detection
call_history = {}
before executing tool_call(name, args):
key = hash(name + JSON(args))
if key in call_history and call_history[key] >= 2:
abort("Loop detected: same tool call repeated 2+ times")
call_history[key] += 1
Tool Count Guidelines
| Tool Count | Model Behavior | Strategy |
|---|---|---|
| 1–10 | Reliable selection | Include all tools in every request |
| 10–30 | Occasional confusion | Group tools by task; prefilter by intent |
| 30+ | Frequent tool selection errors | Dynamic tool loading: select 5–10 tools relevant to current task |
For large tool libraries, add a tool-routing step before the main agent loop: classify the user intent, load only the relevant subset of tools.