Agent Fleet Status

Check the health of multiple machines via SSH. Get a unified dashboard for distributed systems.

Description

Agent Fleet Status monitors the health of a fleet of machines — servers, Mac Minis, cloud instances, Raspberry Pis, or any SSH-accessible host. It checks uptime, disk, memory, CPU, running processes, cron jobs, and service health, then outputs a unified dashboard. Built for operators running distributed agent swarms, home labs, or multi-server deployments.

Activation

This skill activates when:

The user asks about the status of their servers or machines
The user wants to check if a service is running on a remote host
The user mentions fleet health, server status, or system monitoring
The user asks "are my agents running?" or "what's the status of my machines?"

Trigger phrases: "fleet status", "check my servers", "machine health", "are my agents running", "server dashboard", "system status", "check uptime", "what's running on"

Health Check Dimensions

1. Connectivity

Can we reach the host via SSH?
Latency (ms)
Last successful connection

2. System Resources

Check	Warning	Critical
Disk usage	>80%	>95%
Memory usage	>85%	>95%
CPU load (5min avg)	>70%	>90%
Uptime	<1 day (recent reboot)	N/A
Swap usage	>50%	>80%

3. Process Health

Are expected processes running? (by name or PID file)
Zombie processes count
Top 5 processes by CPU/memory

4. Service Health

HTTP endpoints responding? (status code + latency)
Port checks (is the port open and accepting connections?)
Cron job status (last run time, exit codes from logs)

5. Agent-Specific

OpenClaw gateway status
Ollama model loaded
Custom agent processes
Log file freshness (is the agent producing output?)

Instructions

When asked to check fleet status, generate commands and/or output in this format:

# Fleet Status Dashboard
## Checked at: [timestamp]

---

### [HOSTNAME] — [IP] — [STATUS EMOJI removed: OK / WARN / CRITICAL / OFFLINE]

| Check | Value | Status |
|-------|-------|--------|
| SSH | Connected (XXms) | OK |
| Uptime | XX days | OK |
| Disk | XX% of XXG used | [OK/WARN/CRIT] |
| Memory | XX% of XXG used | [OK/WARN/CRIT] |
| CPU Load | X.XX (5m avg) | [OK/WARN/CRIT] |
| Swap | XX% used | [OK/WARN/CRIT] |

**Running Services**:
- [service]: PID XXXX, up Xh, CPU X%, MEM X%
- [service]: PID XXXX, up Xh, CPU X%, MEM X%

**Cron Jobs**: XX active, last run [time], [X failures in 24h]

**Alerts**:
- [Any warnings or critical issues]

---

### Fleet Summary

| Host | Status | Disk | Mem | CPU | Services | Alerts |
|------|--------|------|-----|-----|----------|--------|
| [host1] | OK | 45% | 62% | 0.8 | 5/5 | 0 |
| [host2] | WARN | 82% | 71% | 1.2 | 4/5 | 1 |
| [host3] | OFFLINE | — | — | — | — | SSH FAIL |

SSH Commands Used

The skill generates and executes these commands per host:

# Connectivity
ssh -o ConnectTimeout=5 -o BatchMode=yes user@host "echo ok"

# System resources
ssh user@host "
  uptime;
  df -h / | tail -1;
  free -m | grep Mem;
  cat /proc/loadavg;
  swapon --show --bytes 2>/dev/null
"

# Process health
ssh user@host "
  ps aux --sort=-%mem | head -6;
  ps aux | grep -c Z  # zombie count
"

# Service checks
ssh user@host "
  pgrep -la ollama;
  pgrep -la openclaw;
  pgrep -la node;
  systemctl is-active [service] 2>/dev/null
"

# Cron status
ssh user@host "crontab -l 2>/dev/null | grep -v '^#' | wc -l"

# Log freshness
ssh user@host "find /var/log -name '*.log' -mmin -60 | head -5"

For macOS hosts, adjust commands:

# macOS disk
ssh user@host "df -h / | tail -1"

# macOS memory (no free command)
ssh user@host "vm_stat | head -5; sysctl hw.memsize"

# macOS processes
ssh user@host "ps aux -r | head -6"

Fleet Configuration

Define your fleet as a simple list:

fleet:
  - name: Omni
    host: localhost
    type: linux
    expected_services: [ollama, node]

  - name: BMO
    host: 192.168.1.98
    user: operator
    type: macos
    expected_services: [openclaw, ollama, node, n8n]

  - name: OCI
    host: 192.168.1.92
    user: macmini
    type: macos
    expected_services: [openclaw]

  - name: SailorsBot1
    host: 192.168.1.99
    user: operator
    type: macos
    expected_services: [repflow]

Alerting Logic

Severity escalation:

INFO: All systems green. No action needed.
WARN: One or more hosts have warnings (disk >80%, high memory, service restart detected). Review within 24 hours.
CRITICAL: A host has critical resource usage or a key service is down. Act within 1 hour.
OFFLINE: A host is unreachable via SSH. Investigate immediately — could be network, crash, or power.

Example

Input

Check status of BMO (192.168.1.98, operator, macOS) and OCI (192.168.1.92, macmini, macOS).

Output

(Generates SSH commands, executes them, and produces the dashboard table showing both machines' disk, memory, CPU, running services, cron job count, and any alerts.)

Built by KOINO Capital — Agentic growth systems that run while you sleep. Want this running autonomously 24/7? Deploy with KOINO

ナビゲーション

Skillsとは？

リンク

Agent Fleet Status

Agent Fleet Status

Description

Activation

Health Check Dimensions

1. Connectivity

2. System Resources

3. Process Health

4. Service Health

5. Agent-Specific

Instructions

SSH Commands Used

Fleet Configuration

Alerting Logic

Example

Input

Output

関連スキル(📊 データ・分析)