name: agent-metric-dashboards description: "Observability for Agentforce: adoption, deflection, latency, cost, quality. NOT for agent evaluation/testing (see agentforce-eval-harness) or raw platform-event monitoring." category: agentforce salesforce-version: "Spring '25+" well-architected-pillars:
- Operational Excellence
- Performance triggers:
- "what is my agent deflection rate"
- "how much does each agent conversation cost"
- "agent latency p95"
- "agentforce roi dashboard" tags:
- agentforce
- observability
- dashboards
- metrics inputs:
- "Conversation log access"
- "CSAT or quality signal" outputs:
- "Einstein Analytics / CRM Analytics dashboard"
- "weekly rollup email" dependencies: [] version: 1.0.0 author: Pranav Nagrecha updated: 2026-04-28
Agent Metric Dashboards
The five agent KPIs: turns/conversation, deflection rate, mean latency, tokens/conversation (cost proxy), and quality score. This skill wires each KPI to a source and lays out the single-pane dashboard the executive reviewer needs.
Adoption Signals
Every production agent after the first week; monthly executive review.
Recommended Workflow
- Source adoption + turns from
Conversation__c(or equivalent). Deflection = conversations ending without aCaseescalation divided by total conversations. - Source latency from
Conversation_Turn__c.duration_ms__c. Source tokens from the PE ledger. - Source quality from a post-conversation survey (CSAT) or LLM-as-judge score over a sampled cohort.
- Build a CRM Analytics lens per KPI with the prior 8 weeks; assemble into a single dashboard.
- Weekly email digest: current vs. prior week for each KPI; page on >10% deflection drop or >20% latency spike.
Key Considerations
- Deflection is only meaningful vs. a baseline from before agent deployment.
- LLM-as-judge must be calibrated against human labels quarterly.
- Cost proxy (tokens) drifts when the model changes; track separately from raw latency.
Worked Examples (see references/examples.md)
- Deflection with baseline — Service org with 40% pre-agent escalation rate.
- Tokens/conversation trend — Costs spike after a topic-instruction rewrite.
Common Gotchas (see references/gotchas.md)
- CSAT response bias — Only frustrated users answer the survey — CSAT looks terrible.
- Deflection = 'user gave up' — No escalation because user closed the browser in frustration.
- Cost metric without model version — Cost/conversation changes overnight due to model upgrade.
Top LLM Anti-Patterns (full list in references/llm-anti-patterns.md)
- Single-number CSAT with no context.
- Deflection without a baseline — reports vanity metrics.
- LLM-as-judge never calibrated — grades itself.
Official Sources Used
- Agentforce Developer Guide — https://developer.salesforce.com/docs/einstein/genai/guide/agentforce.html
- Einstein Trust Layer — https://help.salesforce.com/s/articleView?id=sf.generative_ai_trust_layer.htm
- Invocable Actions (Apex) — https://developer.salesforce.com/docs/atlas.en-us.apexref.meta/apexref/apex_classes_invocable_action.htm
- Agentforce Testing Center — https://help.salesforce.com/s/articleView?id=sf.agentforce_testing_center.htm