name: agent-cost-calculator description: Use when estimating or optimising LLM costs for agent operations. Framework for calculating actual monthly spend including hidden multipliers. author: Melisia Archimedes url: https://hivedoctrine.com mcp: https://hive-doctrine-mcp.vercel.app/mcp
Agent Cost Calculator: Estimate Your Monthly LLM Spend
The Real Question
You've built your first agent. It works. Now comes the question that keeps you up at night: What's this actually costing per month?
Most operators wing it. They slap down a credit card, build a rough mental model of "tokens in, tokens out," and hope the bill doesn't spike. That's how you end up shocked. An agent that seemed cheap in testing becomes a $3,000/month burn rate in production because nobody accounted for retries, context window overlap, tool calling overhead, and embedding costs.
I've seen operators scale from one agent to five and watch their monthly bill triple. Not because the agents got more capable—because they didn't understand what was really driving costs. The LLM usage itself is only part of the story.
This guide gives you the framework to calculate actual costs, compare models intelligently, and spot the hidden multipliers before they bite you.
The Formula: Breaking Down What You'll Actually Pay
Start here. This is the foundation.
Base cost per task:
Cost = (Tasks per day x Days per month x Average tokens per task) x (Price per token)
But this is misleading. It doesn't include what actually happens in production. Add these multipliers:
Real cost per task (production-adjusted):
Cost = Base cost x (1 + retry_rate) x (1 + context_overhead) x (1 + tool_call_overhead)
Let's walk through each component:
Tasks per day: How many times does your agent run? Is it continuous (100+ per day)? Scheduled (10-20 per day)? Event-driven (variable)? Be conservative—your production volume will probably be 20-50% higher than your testing estimate.
Tokens per task: This is where operators get sloppy. Most count only the output tokens. Real-world tasks include:
- System prompt tokens (always counted)
- User input tokens (variable)
- Retrieved context or tool results (usually ignored, costs real money)
- Output tokens (obvious)
A "simple" task might be 2,000 tokens. A task with 3-4 tool calls and retrieved context could be 8,000 tokens.
Price per token: This varies wildly. Claude Opus costs 15c per 1M input tokens and 75c per 1M output tokens. Claude Haiku costs 25c per 1M input and 125c per 1M output. GPT-4o costs $5 per 1M input and $15 per 1M output. There's no "average"—you have to choose your model.
Model Cost Comparison: The Numbers
Here's what you're actually spending per 1M tokens:
| Model | Input (per 1M) | Output (per 1M) | Use case | Assumption |
|---|---|---|---|---|
| Claude Opus | $15 | $75 | Complex reasoning, agentic loops | Most expensive, best reasoning |
| Claude Sonnet | $3 | $15 | Balanced work, most agents | Sweet spot for cost/performance |
| Claude Haiku | $0.25 | $1.25 | Routing, classification, simple tasks | Fastest, cheapest, limited context |
| GPT-4o | $5 | $15 | Complex vision, reasoning | Mid-range, less token-efficient |
| GPT-4o mini | $0.15 | $0.60 | Lightweight tasks | Cheap but lower quality reasoning |
| Llama 3.1 (via Groq) | $0.002 | $0.002 | Very simple tasks, high volume | Near-free but limited capabilities |
Real example: A customer service agent running 50 tasks/day
Assumption: 3,000 input tokens + 1,000 output tokens per task.
- Claude Opus: 50 x 30 x (3,000 x $0.000015 + 1,000 x $0.000075) = $1,800/month
- Claude Sonnet: 50 x 30 x (3,000 x $0.000003 + 1,000 x $0.000015) = $180/month
- GPT-4o: 50 x 30 x (3,000 x $0.000005 + 1,000 x $0.000015) = $300/month
- GPT-4o mini: 50 x 30 x (3,000 x $0.00000015 + 1,000 x $0.00000060) = $9/month
The gap between Opus and Haiku is 200x. But here's the trap: if Haiku fails 30% of tasks and you have to retry, that cheap option just became expensive and slow.
Hidden Multipliers: What Kills Your Budget
You've calculated your base cost. Now multiply by reality:
Retry rate (1.2x to 1.5x): Production isn't perfect. Rate limits, timeouts, model hallucinations—assume 20-50% of tasks need at least one retry. If your base model is too weak, this compounds.
Context window overhead (1.1x to 1.3x): You're not sending raw input. You're sending system prompts, retrieved documents, examples, tool definitions. This context wraps every request. For a 3,000-token task, the actual call might be 4,000 tokens. That's context overhead.
Tool call tokens (1.2x to 1.8x): Every tool call uses tokens. The function schema takes tokens. The tool result comes back and burns more tokens in context. If your agent makes 3-4 tool calls per task, you're burning 20-80% more tokens than the base calculation.
Embedding costs (often forgotten): Retrieval-augmented generation (RAG) isn't free. If you're vectorizing documents or live search results, you're paying per embedding. At $0.02 per 1M embeddings (Anthropic), this seems cheap until you're running embeddings for 10,000 documents monthly. That's another $0.20 baseline.
Real multiplier in production:
Actual monthly cost = Base cost x 1.35 (retry) x 1.20 (context) x 1.40 (tools) = 2.27x the naive calculation
Most operators underestimate their costs by 2-3x.
Scaling Math: From 1 Agent to 20
Cost doesn't scale linearly. It's not 20x at 20 agents because:
- You can use cheaper models for simple agents (routing, classification)
- Shared infrastructure and caching reduce redundant calls
- Batch API pricing (Anthropic, OpenAI) kicks in at volume and saves 50%
Realistic scaling:
| Scenario | Monthly cost | Model mix |
|---|---|---|
| 1 agent (50 tasks/day) | $180 | Sonnet only |
| 5 agents (mixed load) | $650 | 3x Sonnet, 2x Haiku |
| 20 agents (mixed load) | $2,100 | 10x Sonnet, 5x Haiku, 5x open-source |
| 100+ agents (production) | $5,000-$8,000 | Batch API, model routing, heavy caching |
At scale, you save ~40% per request by switching to batch processing. You also have the discipline to choose the right model per task instead of defaulting to the expensive one.
What's Next
You now have the framework. Here's how to use it:
-
Map your actual token usage: Log a week of production requests. Count tokens (use tiktoken for GPT models, claude-tokenizer for Anthropic). Get real numbers, not estimates.
-
Identify your cost drivers: Which agent eats the most tokens? Is it the model choice, retry rate, or context overhead? Fix the top 3.
-
Model your growth: Use the formula with your actual numbers. Project 3, 6, and 12 months ahead. Where does cost become a problem?
-
Implement routing: Don't run everything on Opus. Route simple tasks (classification, extraction) to Haiku. Keep Opus for reasoning.
-
Read "Cost Optimisation for Agent Operations" (honey tier) for tactical moves: caching strategies, batch processing, context compression, model fine-tuning triggers.
-
Explore "LLM Routing & Model Selection Guide" for deeper decision frameworks on when to use each model.
The difference between an operator who understands their costs and one who doesn't is usually $500-$2,000 per month in unnecessary spend. That's money that could go into product, R&D, or margin.
Know your numbers. They'll change the way you build.
From The Hive Doctrine -- hivedoctrine.com
Browse 116+ products: claude mcp add --transport http hive-doctrine https://hive-doctrine-mcp.vercel.app/mcp
The field, not the flower.