Site Watch

Other

Lightweight CLI-based web page change monitor with AI summaries and multi-channel notifications

Install

openclaw skills install site-watch

Site Watch (site-watch)

Version: 1.0.0 | Category: Utilities / Automation Slug: site-watch | Runtime: Node.js 18+ Dependencies: cheerio, better-sqlite3, node-cron, node-notifier

Description

A lightweight CLI-based web page change monitor that periodically fetches pages, detects content changes, generates intelligent AI summaries (powered by DeepSeek), and pushes notifications through multiple channels. All data stays local — zero cloud dependency.

Core workflow: Add a URL → Scheduler fetches periodically → Detects changes → AI summarizes → Multi-channel notification → You act on it.


First-Success Path

Goal: First monitoring target added and one change detected within 60 seconds.

Step 1: clawhub install site-watch
Step 2: site-watch add --url "https://jsonplaceholder.typicode.com/posts/1" --name "Test Page"
Step 3: Internal pipeline:
  a. index.js CLI parses --url and --name arguments
  b. target-manager.js creates target entry in SQLite DB
  c. fetcher.js HTTP GET with automatic retry
  d. content-extractor.js cheerio HTML → text extraction
  e. change-detector.js SHA-256 hash baseline snapshot
  f. AI summary generated (if DEEPSEEK_API_KEY set)
  g. notifier.js prints confirmation with initial snapshot
Step 4: User sees "✅ Monitor Added" with preview
Step 5: Value achieved — first target is being watched

Key Metrics: Install→First target < 10s, Change detection < 5s, Startup < 1s.


Installation

clawhub install site-watch

Dependencies are auto-installed. For headless browser support (JS-rendered pages):

cd ~/.openclaw/skills/site-watch && npm install playwright-core

Usage

All commands follow the pattern:

site-watch <action> [options]

Add a monitoring target

site-watch add --url "https://example.com/page" [--name "My Page"] [--selector ".content"] [--frequency 1h] [--tags tag1,tag2]

List all targets

site-watch list [--status active|paused|error|all]

Start/stop the scheduler

site-watch start
site-watch stop

Manual check

site-watch check --name "My Target"

View history

site-watch history --name "My Target" [--since 2026-01-01]

Export data

site-watch export --name "My Target" --format csv

Full status overview

site-watch status

Actions Reference

ActionDescription
addAdd a new URL to monitor, fetches initial snapshot
removeRemove a monitored target and its history
listList all monitored targets with status
statusScheduler health + target summary
startStart the background scheduler process
stopGracefully stop the scheduler
checkManually check a single target for changes
historyView change history for a target
exportExport change history as JSON/CSV/Markdown

Options

OptionTypeDefaultDescription
--urlstring(required)Target page URL
--namestringpage titleDisplay name for the target
--selectorstringfull pageCSS selector for area monitoring
--frequencystring1hCheck frequency (1m,5m,15m,30m,1h,6h,12h,24h)
--sensitivitystringnormalChange sensitivity (low,normal,high)
--presetstringautoPlatform preset (jd,taobao,pdd,dewu,bilibili,zhihu,xiaohongshu,github,auto,none)
--tagslistComma-separated tags for categorization
--render-jsflagoffUse headless browser for JS-rendered pages
--no-summaryflagoffDisable AI-generated change summaries
--timeoutnumber30000Request timeout in milliseconds
--formatstringjsonExport format (json,csv,markdown)
--data-dirpathdefaultCustom data storage directory

Sample Prompts

1. Price monitoring for JD.com product

site-watch add \
  --url "https://item.jd.com/100012345.html" \
  --name "京东 iPhone 15 Pro Price" \
  --preset jd \
  --frequency 1h \
  --tags shopping,apple

Expected output:

✅ Monitor Added

  📋 Target: 京东 iPhone 15 Pro Price
  🔗 URL: https://item.jd.com/100012345.html
  🏷️  Tags: shopping, apple
  🎯 Selector: .summary-price .price
  🌐 Platform: jd
  ⏱️  Frequency: every 60 minute(s)

  📸 Initial Snapshot Preview:
  "iPhone 15 Pro 256GB 暗紫色 ¥7,999 现货"

  💡 Use `site-watch start` to begin periodic monitoring
  💡 Use `site-watch check --name "京东 iPhone 15 Pro Price"` to check for changes

2. Content tracking for a blog/article page

site-watch add \
  --url "https://example.com/blog/latest-post" \
  --name "Tech Blog Updates" \
  --selector "article.main-content" \
  --frequency 6h \
  --tags blog,tracking

3. Job listing monitoring on BOSS直聘

site-watch add \
  --url "https://www.zhipin.com/web/geek/job?query=前端工程师" \
  --name "Frontend Jobs" \
  --frequency 12h \
  --tags job,frontend

4. Check status of all monitored targets

site-watch list

5. Start the background scheduler

site-watch start

Data Storage

All data is stored locally under ~/.openclaw/data/site-watch/:

File/DirectoryPurpose
config.jsonGlobal configuration (notifications, etc.)
site-watch.dbSQLite database (targets, snapshots, changes)
targets/Per-target config (cookies/headers encrypted)
.encryption-keyAES-256-GCM key for credential encryption
scheduler.pidPID file for scheduler process tracking

Security: Sensitive fields (cookies, custom headers, webhook URLs) are encrypted using AES-256-GCM before being written to disk. Config files are created with 0600 permissions.


AI Summary

When a change is detected, an optional AI summary can be generated to explain what changed in natural language. The tool supports:

  1. DeepSeek (preferred) — set DEEPSEEK_API_KEY environment variable
  2. Custom LLM endpoint — set SITEWATCH_LLM_ENDPOINT, SITEWATCH_LLM_API_KEY, SITEWATCH_LLM_MODEL
  3. Template-based fallback — built-in, no API needed

Only the change diff (max 3000 chars) is sent to the LLM — never the full page content. PII (phone numbers, emails, ID numbers) is automatically masked before sending.


Platform Presets

PresetPlatformKey Selectors
jdJD.com.summary-price .price
taobaoTaobao/Tmall.tm-price
pddPinduoduo.goods-price
dewuDewu/Poizon.price-text
bilibiliBilibili.video-info-desc
zhihuZhihu.Post-RichText
xiaohongshuXiaohongshu.note-content
githubGitHub.release-body

Use --preset auto (default) to auto-detect the platform from the URL, or --preset none for manual selector specification.


Architecture

index.js (CLI entry)
  ├── target-manager.js    — CRUD for monitored targets
  ├── fetcher.js           — HTTP requests with retry & anti-bot measures
  ├── content-extractor.js — HTML → text via cheerio + CSS selectors
  ├── noise-filter.js      — Timestamp/ads/counter removal
  ├── change-detector.js   — SHA-256 hash + text diff + sensitivity
  ├── ai-summarizer.js     — LLM-powered change summary (DeepSeek)
  ├── scheduler.js         — Cron-based periodic checker
  ├── notifier.js          — Multi-channel dispatch (terminal, system, webhook)
  ├── history-store.js     — SQLite persistence with zlib compression
  ├── config.js            — Config manager + AES-256-GCM encryption
  └── security.js          — SSRF protection, PII masking, robots.txt

Error Codes

CodeMeaningRecovery
E001Invalid URL formatProvide a valid URL
E002DNS/network errorCheck connectivity / DNS
E003Request timeoutIncrease --timeout
E004HTTP error (4xx/5xx)Check URL accessibility
E005CSS selector no matchVerify selector / use full page
E008Rate limited (429)Reduce frequency
E010AI summary failedFalls back to text diff
E013Scheduler conflictStop first, then start
E014Target not found/duplicateCheck target name/id

License

MIT — Part of the Golden Bean skill collection.