name: agent-rate-limit-strategy description: "Control LLM spend and Apex governor exposure for high-traffic Agentforce agents via per-user token budgets and graceful fallback. NOT for API rate-limiting of REST endpoints." category: agentforce salesforce-version: "Spring '25+" well-architected-pillars:

Scalability
Operational Excellence triggers:
"agent token budget blown"
"one user drives the whole agent quota"
"graceful fallback when agent is over limit"
"rate-limit agentforce per user" tags:
agentforce
rate-limiting
cost
platform-events inputs:
"Traffic forecast"
"per-tenant/per-user quotas"
"fallback UX" outputs:
"Rate-limit policy CMDT"
"fallback messaging"
"observability dashboard" dependencies: [] version: 1.0.0 author: Pranav Nagrecha updated: 2026-04-28

Agent Rate Limit Strategy

Agentforce exposes internal LLM quotas indirectly — you hit them as platform 503s with no forward signal. This skill builds a client-side budget gate in front of the agent: per-user token ledger, CMDT-driven thresholds, and a graceful fallback when the budget is exhausted.

Recommended Workflow

Define budget: tokens/user/hour, tokens/tenant/day. Persist in Agent_Rate_Limit__mdt per-persona.
In the channel entry point (LWC wrapper or Connect API), call a BudgetService.checkAndConsume(userId, estTokens) before dispatching to the agent.
Log consumption via Platform Event Agent_Token_Consumed__e — the subscriber rolls into a User_Token_Ledger__c aggregate.
When the budget is exhausted, render the fallback UX (human handoff / retry-after message) instead of calling the agent.
Dashboard: p50/p95 consumption per persona, budget exhaustion count, tokens/turn — page SRE when exhaustion >1%.

Key Considerations

Estimate tokens conservatively from input length (chars/4) before the call; reconcile after.
Ledger is eventually consistent — use the Platform Event pattern, not synchronous DML.
Fallback UX must be pre-approved; a generic 'try again later' erodes trust.

Worked Examples (see `references/examples.md`)

Budget service sketch — 100 Service reps use agent summarization; one rep loops a bad input 500x.
Graceful fallback — Budget exhausted mid-conversation.

Common Gotchas (see `references/gotchas.md`)

Estimating tokens from chars misses long RAG context — Estimate says 500 tokens; reality is 5000 after grounding.
Platform Event volume limits — High-traffic agent saturates your 24h PE quota.
Ledger bucket skew — Timezone boundary resets at midnight UTC, users see it mid-afternoon locally.

Top LLM Anti-Patterns (full list in `references/llm-anti-patterns.md`)

Trusting Agentforce to rate-limit for you — it only fails loudly on hard limits.
Hard-coded tokens/minute without per-persona CMDT — cannot respond to traffic shifts.
Synchronous ledger DML per turn — blows DML limits.

Official Sources Used

Agentforce Developer Guide — https://developer.salesforce.com/docs/einstein/genai/guide/agentforce.html
Einstein Trust Layer — https://help.salesforce.com/s/articleView?id=sf.generative_ai_trust_layer.htm
Invocable Actions (Apex) — https://developer.salesforce.com/docs/atlas.en-us.apexref.meta/apexref/apex_classes_invocable_action.htm
Agentforce Testing Center — https://help.salesforce.com/s/articleView?id=sf.agentforce_testing_center.htm

ナビゲーション

Skillsとは？

リンク

agent-rate-limit-strategy

Agent Rate Limit Strategy

Recommended Workflow

Key Considerations

Worked Examples (see `references/examples.md`)

Common Gotchas (see `references/gotchas.md`)

Top LLM Anti-Patterns (full list in `references/llm-anti-patterns.md`)

Official Sources Used

関連スキル(🌐 Web開発)

ナビゲーション

Skillsとは？

リンク

agent-rate-limit-strategy

Agent Rate Limit Strategy

Recommended Workflow

Key Considerations

Worked Examples (see references/examples.md)

Common Gotchas (see references/gotchas.md)

Top LLM Anti-Patterns (full list in references/llm-anti-patterns.md)

Official Sources Used

関連スキル(🌐 Web開発)

Worked Examples (see `references/examples.md`)

Common Gotchas (see `references/gotchas.md`)

Top LLM Anti-Patterns (full list in `references/llm-anti-patterns.md`)