Gateway Watchdog
v1.4.0Monitor OpenClaw Gateway health by detecting abnormal error rates in logs. Use when: (1) setting up Gateway error monitoring, (2) diagnosing repeated API fai...
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description match the behavior. The script reads OpenClaw logs (systemd journal or ~/.openclaw/logs) and computes error rates/spikes. Required binaries (systemctl, journalctl, grep, find, awk, etc.) are appropriate for log analysis. No unrelated credentials, network access, or surprising binaries are requested.
Instruction Scope
SKILL.md instructs running the included shell script and integrating it into heartbeats/cron; the runtime instructions stay within monitoring scope. One minor mismatch: SKILL.md emphasizes "read-only" but the script writes a state.json and history.log under ~/.local/state/gateway-watchdog (local state/history). The script also relies on optional environment variables (WATCHDOG_*) that are documented in SKILL.md but not listed as required in the registry metadata.
Install Mechanism
No install spec or remote downloads — it's instruction-only with an included script. Nothing is pulled from external URLs or written to system-wide locations beyond the user's XDG state directory.
Credentials
The skill declares no required credentials (and none are needed). It does read several optional environment variables (WATCHDOG_THRESHOLD, WATCHDOG_WINDOW, WATCHDOG_SPIKE_RATIO, WATCHDOG_EXTRA_PATTERNS, WATCHDOG_DIR/STATE/LOG) which are reasonable for configurability but are not listed as required in the registry metadata — this is a documentation/metadata gap rather than a direct risk. WATCHDOG_EXTRA_PATTERNS is sanitized against some shell metacharacters, but user-supplied regexes could still be complex or expensive to evaluate.
Persistence & Privilege
The skill does not request permanent "always" inclusion and does not modify other skills or system config. It writes its own state/history under the user's XDG state directory (no elevated privileges). Autonomous invocation is allowed by default (normal for skills) but nothing here amplifies that privilege.
Assessment
This skill appears to do what it says: local, read-oriented log analysis with local state/history. Before installing, consider: (1) review and approve the included script if you don't trust the source; (2) the script writes state/history to ~/.local/state/gateway-watchdog — if that data is sensitive, pick a different path or rotate access; (3) it reads OpenClaw logs (journalctl or ~/.openclaw/logs) which may contain sensitive information, so ensure the environment running it is trusted; (4) WATCHDOG_EXTRA_PATTERNS accepts user regexes (the script filters some unsafe shell chars but not all regex constructs), so avoid supplying untrusted patterns; (5) integrate notifications (cron/heartbeat) carefully so reports are only sent to intended recipients. Overall the package is internally consistent and proportionate to its stated purpose.Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Gateway Watchdog
Detect abnormal error patterns in the OpenClaw Gateway before they cause damage. Works with all channels: Telegram, WhatsApp, Discord, Slack, Signal, iMessage, Feishu, and more.
Born from a real incident: a silent try-catch caused 76,744 failed retries in 8 hours — undetected until the API quota was exhausted.
What It Detects
| Category | Patterns |
|---|---|
| Rate limiting | HTTP 429, rate.limit, too many requests |
| Server errors | HTTP 5xx status codes |
| Auth/permission | HTTP 401/403, unauthorized, forbidden, token expired |
| Network errors | ETIMEDOUT, ECONNREFUSED, ECONNRESET, ENOTFOUND, socket hang up |
| Delivery failures | sendMessage failed, deliver failed, fetch failed |
| Custom | User-defined via WATCHDOG_EXTRA_PATTERNS env var |
Smart Analysis
- Error rate (errors/min) — more meaningful than raw count
- Spike detection — alerts when errors jump 3x+ vs last check
- Error concentration — flags when 80%+ errors are one type (single fault source)
Quick Start
bash scripts/gateway-watchdog.sh check # silent unless errors exceed threshold
bash scripts/gateway-watchdog.sh verbose # always outputs full report
bash scripts/gateway-watchdog.sh history # show monitoring history
bash scripts/gateway-watchdog.sh trend # last 24h error trend
Heartbeat integration
Add to HEARTBEAT.md:
## Gateway Error Monitoring (every heartbeat)
- Run `~/.openclaw/workspace/skills/gateway-watchdog/scripts/gateway-watchdog.sh check`
- If output is non-empty, report to user immediately
- No output = healthy, skip reporting
Cron (optional)
openclaw cron add \
--name "gateway-watchdog" \
--schedule "*/30 * * * *" \
--task "Run gateway-watchdog.sh verbose. If errors detected, notify user with the report." \
--channel last
Configuration
All via environment variables:
| Variable | Default | Description |
|---|---|---|
WATCHDOG_THRESHOLD | 30 | Error count that triggers alert |
WATCHDOG_WINDOW | 30 | Monitoring window in minutes |
WATCHDOG_SPIKE_RATIO | 3 | Alert when errors jump Nx vs last check |
WATCHDOG_EXTRA_PATTERNS | (empty) | Custom regex patterns (e.g., 99991400|99991403) |
WATCHDOG_STATE | ~/.local/state/gateway-watchdog/state.json | State file |
WATCHDOG_LOG | ~/.local/state/gateway-watchdog/history.log | History log |
Adding channel-specific patterns
# Feishu-specific error codes
export WATCHDOG_EXTRA_PATTERNS='99991400|99991403|99991404|99991429'
# Telegram-specific
export WATCHDOG_EXTRA_PATTERNS='Too Many Requests|FLOOD_WAIT|bot was blocked'
# Discord-specific
export WATCHDOG_EXTRA_PATTERNS='DiscordAPIError|Missing Permissions|Unknown Channel'
Interpreting Results
🔴 Alert (Chinese locale)
🔴 Gateway 最近 30 分钟出现 150 条异常错误(阈值: 30,5/min)
📈 错误突增: 12 → 150(3倍阈值触发)
错误分类:
429/限流: 120
5xx服务端错误: 5
认证/权限: 0
网络错误: 5
消息投递失败: 20
⚠️ 单一错误类型「429/限流」占比 80%,可能是单一故障源
🔴 Alert (English equivalent)
🔴 Gateway detected 150 errors in the last 30 min (threshold: 30, 5/min)
📈 Error spike: 12 → 150 (3x threshold triggered)
Error breakdown:
429/Rate-limit: 120
5xx Server errors: 5
Auth/Permission: 0
Network errors: 5
Delivery failures: 20
⚠️ Single error type "429/Rate-limit" accounts for 80%+ — likely a single fault source
💚 Healthy
No output from check mode.
Limitations
- Requires systemd + journalctl (falls back to
~/.openclaw/logs/on macOS) - Reactive, not preventive
- Cannot pinpoint which extension is failing — check error details for clues
Security
- Read-only: Only reads logs
- No credentials: No API keys accessed
- No network: No outbound requests
- User state only: State in
~/.local/state/gateway-watchdog/(XDG standard, no elevated permissions needed)
Files
2 totalSelect a file
Select a file to preview.
Comments
Loading comments…
