{"skill":{"slug":"job-execution-monitor","displayName":"Job Execution Monitor","summary":"Monitor scheduled cron jobs hourly, alerting you only if jobs fail or miss their expected run time with tracked alert state to avoid spam.","description":"# Job Execution Monitor\n\nMonitor scheduled jobs (cron) and alert when they fail or miss their schedule.\n\n## Install / Update (ClawHub)\n\nInstall:\n```bash\nclawhub install job-execution-monitor\n```\n\nUpdate:\n```bash\nclawhub update job-execution-monitor\n```\n\n---\n\n## When to Use\nUse when the user asks to:\n- \"monitor cron jobs\"\n- \"alert on job failures\"\n- \"check if jobs ran\"\n- \"job healthcheck\"\n- \"task surveillance\"\n- \"why didn't my scheduled job run?\"\n\n## What It Does\n\n**Heartbeat-based monitoring:**\n- Agent checks jobs during periodic heartbeat polls (~every 60min)\n- Uses cheapest LLM (Gemini Flash or Haiku) to minimize cost\n- Only sends alerts when problems detected\n- Tracks alert state to avoid spam\n\n**Cost:** ~100k tokens/day (~48 checks × 2k tokens)\n\n## How It Works\n\n1. **Configuration:** Jobs to monitor are listed in `job-execution-monitor.json`\n2. **Heartbeat check:** Agent wakes every 15-30min, checks jobs every ~4th wake\n3. **Detection:** Compares last run time vs. expected schedule + tolerance\n4. **Alert:** Wake event with context (expected schedule vs last run) → you decide next action\n5. **Recovery:** Clears alert when job runs successfully again\n\n## Disable / Uninstall\n\n### If installed via systemd *user* timer\n```bash\nsystemctl --user stop openclaw-job-execution-monitor.timer\nsystemctl --user disable openclaw-job-execution-monitor.timer\nsystemctl --user daemon-reload\n\n# Optional: remove unit files\nrm -f ~/.config/systemd/user/openclaw-job-execution-monitor.service \\\n      ~/.config/systemd/user/openclaw-job-execution-monitor.timer\n```\n\n### If installed via cron\n```bash\ncrontab -l | sed '/job-execution-monitor\\/scripts\\/healthcheck\\.sh/d' | crontab -\n```\n\n### Optional cleanup (config/state/log)\n```bash\nrm -f ~/.openclaw/workspace/job-execution-monitor.json\nrm -f ~/.openclaw/workspace/.job-execution-monitor-state.json\nrm -f ~/.openclaw/workspace/job-execution-monitor.log\n```\n\n---\n\n## Configuration\n\n**File:** `~/.openclaw/workspace/job-execution-monitor.json`\n\n```json\n{\n  \"checkIntervalMin\": 60,\n  \"jobs\": {\n    \"Daily 21:00 journaling (projects + accomplishments + next day plan)\": {\n      \"schedule\": \"0 22 * * *\",\n      \"tolerance\": 600,\n      \"critical\": true,\n      \"expectedMinLength\": 200,\n      \"errorPatterns\": [\"error\", \"failed\", \"Pong\", \"token overflow\"]\n    }\n  }\n}\n```\n\n**Fields:**\n- `schedule`: cron expression for expected run time\n- `tolerance`: grace period in seconds (default 600 = 10min)\n- `critical`: if true, alerts immediately (future: could escalate)\n- `expectedMinLength`: minimum response length (phase 2)\n- `errorPatterns`: text patterns that indicate failure (phase 2)\n\n## State Tracking\n\n**File:** `~/.openclaw/workspace/.job-execution-monitor-state.json`\n\n```json\n{\n  \"lastCheck\": 1771025000,\n  \"alerts\": {\n    \"daily-wrap-up_missed\": 1771024500\n  }\n}\n```\n\n- `lastCheck`: unix timestamp of last check\n- `alerts`: map of alert_key → timestamp (prevents spam)\n\n## Instructions (for Agent)\n\nIn `HEARTBEAT.md`:\n\n```markdown\n## Job Execution Monitor (every ~60min, rotate)\n\nCheck cron jobs for missed schedules. Only alert if problem found.\n\n**Instructions:**\n1. Load `job-execution-monitor.json` config\n2. Call `cron list` \n3. For each job in config:\n   - Extract `state.lastRunAtMs` and `state.lastStatus`\n   - Parse schedule (e.g., \"0 22 * * *\" = 22:00 daily)\n   - If last run > (expected time + tolerance): **ALERT**\n   - If last run recent: **SILENT**\n4. On alert: send wake event with job name, expected time, last run time\n5. On recovery (was alerting before, now OK): send recovery wake event\n\n**State tracking:** `~/.openclaw/workspace/.job-execution-monitor-state.json`\n- Track which jobs already alerted (don't spam)\n- Clear alert flag when job recovers\n\n**Rotate check:** Only run every ~4th heartbeat (once/hour if heartbeat is 15min)\n```\n\n## Examples\n\n**Scenario 1: Job missed**\n```\n🔴 Job Execution Monitor: \"daily-wrap-up\" missed schedule\nExpected: 22:00 ±10min\nLast run: 5h 32m ago\nChecking logs...\n```\n\n**Scenario 2: Job recovered**\n```\n✅ Job Execution Monitor: \"daily-wrap-up\" recovered\nLast run: 22:02 (2min ago)\n```\n\n**Scenario 3: All OK**\n```\n(silent - no wake event, no alert)\n```\n\n## Cost Analysis\n\n**Per check (~2k tokens):**\n- Load config: ~200 tokens\n- Call cron list: ~500 tokens\n- Parse + compare: ~500 tokens\n- Decision + response: ~800 tokens\n\n**Daily (~48 checks):**\n- 48 × 2k = **~100k tokens/day**\n- Using Gemini Flash: **~$0.01/day** ✅\n\n**Compared to alternatives:**\n- Bash script every 10min: 0 tokens, but complex + fragile\n- Cron job every 10min: 144 × 2k = ~300k tokens/day\n- Heartbeat every 60min: **~100k tokens/day** ← chosen ✅\n\n## Files\n\n- `SKILL.md` - This documentation\n- `README.md` - Quick start\n- `config/job-execution-monitor.example.json` - Config template\n- `scripts/patterns.json` - Error patterns (phase 2)\n- `~/workspace/job-execution-monitor.json` - User config\n- `~/workspace/.job-execution-monitor-state.json` - Alert state\n\n## Philosophy\n\n**Smart monitoring, minimal cost.**\n\n- Check frequency: ~1/hour (good enough for daily jobs)\n- Use cheapest LLM: Gemini Flash or Haiku\n- Only wake main session when real problems occur\n- State tracking prevents alert spam\n- Agent-based = leverages existing tools, no auth/API hassles\n\n---\n\n**Phase 1 complete.** ✅  \n**Phase 2 (pattern matching):** Coming soon.  \n**Phase 3 (structured validation):** Future.\n","tags":{"latest":"1.0.3"},"stats":{"comments":0,"downloads":288,"installsAllTime":0,"installsCurrent":0,"stars":0,"versions":4},"createdAt":1771334698063,"updatedAt":1778491563795},"latestVersion":{"version":"1.0.3","createdAt":1771363339988,"changelog":"Docs: add disable/uninstall instructions; clarify monitoring-only wording; clean up install script wording.","license":null},"metadata":null,"owner":{"handle":"tradmangh","userId":"s17aaegq3rafytyraeyc72bkjx83twpg","displayName":"Thomas J. Radman","image":"https://avatars.githubusercontent.com/u/4415781?v=4"},"moderation":null}