name: agent-rate-limit-strategy description: "Control LLM spend and Apex governor exposure for high-traffic Agentforce agents via per-user token budgets and graceful fallback. NOT for API rate-limiting of REST endpoints." category: agentforce salesforce-version: "Spring '25+" well-architected-pillars:
- Scalability
- Operational Excellence triggers:
- "agent token budget blown"
- "one user drives the whole agent quota"
- "graceful fallback when agent is over limit"
- "rate-limit agentforce per user" tags:
- agentforce
- rate-limiting
- cost
- platform-events inputs:
- "Traffic forecast"
- "per-tenant/per-user quotas"
- "fallback UX" outputs:
- "Rate-limit policy CMDT"
- "fallback messaging"
- "observability dashboard" dependencies: [] version: 1.0.0 author: Pranav Nagrecha updated: 2026-04-28
Agent Rate Limit Strategy
Agentforce exposes internal LLM quotas indirectly — you hit them as platform 503s with no forward signal. This skill builds a client-side budget gate in front of the agent: per-user token ledger, CMDT-driven thresholds, and a graceful fallback when the budget is exhausted.
Recommended Workflow
- Define budget: tokens/user/hour, tokens/tenant/day. Persist in
Agent_Rate_Limit__mdtper-persona. - In the channel entry point (LWC wrapper or Connect API), call a
BudgetService.checkAndConsume(userId, estTokens)before dispatching to the agent. - Log consumption via Platform Event
Agent_Token_Consumed__e— the subscriber rolls into aUser_Token_Ledger__caggregate. - When the budget is exhausted, render the fallback UX (human handoff / retry-after message) instead of calling the agent.
- Dashboard: p50/p95 consumption per persona, budget exhaustion count, tokens/turn — page SRE when exhaustion >1%.
Key Considerations
- Estimate tokens conservatively from input length (
chars/4) before the call; reconcile after. - Ledger is eventually consistent — use the Platform Event pattern, not synchronous DML.
- Fallback UX must be pre-approved; a generic 'try again later' erodes trust.
Worked Examples (see references/examples.md)
- Budget service sketch — 100 Service reps use agent summarization; one rep loops a bad input 500x.
- Graceful fallback — Budget exhausted mid-conversation.
Common Gotchas (see references/gotchas.md)
- Estimating tokens from chars misses long RAG context — Estimate says 500 tokens; reality is 5000 after grounding.
- Platform Event volume limits — High-traffic agent saturates your 24h PE quota.
- Ledger bucket skew — Timezone boundary resets at midnight UTC, users see it mid-afternoon locally.
Top LLM Anti-Patterns (full list in references/llm-anti-patterns.md)
- Trusting Agentforce to rate-limit for you — it only fails loudly on hard limits.
- Hard-coded tokens/minute without per-persona CMDT — cannot respond to traffic shifts.
- Synchronous ledger DML per turn — blows DML limits.
Official Sources Used
- Agentforce Developer Guide — https://developer.salesforce.com/docs/einstein/genai/guide/agentforce.html
- Einstein Trust Layer — https://help.salesforce.com/s/articleView?id=sf.generative_ai_trust_layer.htm
- Invocable Actions (Apex) — https://developer.salesforce.com/docs/atlas.en-us.apexref.meta/apexref/apex_classes_invocable_action.htm
- Agentforce Testing Center — https://help.salesforce.com/s/articleView?id=sf.agentforce_testing_center.htm