name: ops-detection-incident-routing description: Detect agent runtime anomalies and route incidents through approval-safe guardrails. Use when you need deterministic checks for cron failures, context pressure, dangling sessions, token spikes, and a controlled incident workflow (detect -> route -> investigate -> remediate). homepage: https://github.com/your-org/openclaw-public-skills metadata: {"clawdbot":{"emoji":"🛟","requires":{"bins":["bash","jq","python3"]}}}
ops-detection-incident-routing
Run deterministic operations checks and route incidents with guardrails.
This skill ships a small toolkit for:
- detecting runtime anomalies from local state/log files
- applying in-flight + cooldown guards
- emitting structured incident actions for investigator/remediator flows
Use This Skill
Use this skill when you need a production-safe ops loop for agent systems and do not want ad-hoc prompt-only monitoring.
Files
scripts/ops-threshold-detector.shreads session/cron/snapshot state and appends detector JSONL eventsscripts/incident-guard-check.shchecks in-flight/cooldown guard status for a check idscripts/incident-state-update.shupdates guard state for start/complete/fail transitionsscripts/ops-incident-router.shconverts detector alerts into structured actionsscripts/ops-detector-cycle.shdetector + router cycle runnerscripts/setup.shdependency checks + local example scaffoldscripts/clean-generated.shremoves generated.jsonland lock artifacts before republishing from a used folder
Setup
bash scripts/setup.sh
Quick Start
Run one full dry-run cycle:
bash scripts/ops-detector-cycle.sh \
--workspace "$(pwd)/examples/workspace" \
--state-file "$(pwd)/examples/incident-state.json" \
--detector-out "$(pwd)/examples/ops-detector.jsonl" \
--router-out "$(pwd)/examples/router-actions.jsonl"
Run live mode (router also acquires in-flight locks):
bash scripts/ops-detector-cycle.sh \
--workspace "$(pwd)/examples/workspace" \
--state-file "$(pwd)/examples/incident-state.json" \
--detector-out "$(pwd)/examples/ops-detector.jsonl" \
--router-out "$(pwd)/examples/router-actions.jsonl" \
--live
Output Contract
Detector writes one JSON line per run:
{
"ts": "2026-02-24T02:30:00Z",
"status": "ALERT",
"checks": 5,
"alerts": [{"sev":"Sev-2","trigger":"cron_failure","value":2,"threshold":0}],
"gaps": []
}
Router emits one JSON action per alert decision:
{"action":"spawn","check_id":"cron_failure","severity":"Sev-2","mode":"dry-run","task":"Investigate incident: cron_failure"}
Operational Pattern
- schedule
ops-threshold-detector.sh(every 5-15 min) - feed the latest detector line to
ops-incident-router.sh - spawn investigator/remediator only from router output
- keep remediation behind explicit owner approval
For details, read references/architecture.md.