Install
openclaw skills install smart-scraper-webExtract structured data from websites. Tables, lists, prices, articles, metadata. HTML parsing with caching. Zero external dependencies.
openclaw skills install smart-scraper-web⚠️ Security Note — This skill sends user-provided URLs over the network and stores fetched page contents locally in a cache (
memory/scraper-cache/cache.json). Do not use with sensitive, authenticated, internal, or attacker-controlled URLs until redirect targets are revalidated. Clear the cache (rm memory/scraper-cache/cache.json) after scraping if page contents or URLs may be sensitive.
Stop copying data by hand. Start extracting it automatically.
Web content is everywhere but inaccessible to agents. web_fetch gets raw HTML, but you need structure — tables, prices, lists, article text — to make it useful.
Smart Scraper turns raw HTML into structured data with one command.
node skills/smart-scraper/smart-scraper.js --extract https://example.com
Returns title, headings, paragraphs, links, tables, lists, prices, images, and metadata.
node skills/smart-scraper/smart-scraper.js --extract --table https://example.com/pricing
node skills/smart-scraper/smart-scraper.js --extract --list https://example.com/blog
node skills/smart-scraper/smart-scraper.js --extract --price https://example.com/products
node skills/smart-scraper/smart-scraper.js --extract --article https://example.com/blog/post
node skills/smart-scraper/smart-scraper.js --parse "<html>...</html>"
node skills/smart-scraper/smart-scraper.js --status
--status--cache to enable caching (opt-in by default).rm memory/scraper-cache/cache.json).--no-cache flag. To enable caching, explicitly add the --cache flag to your command.--no-cache — Extract without storing any data locally (privacy mode)--cache — Enable caching of scraped content locally (opt-in by default)memory/scraper-cache/cache.json (or --dir to override)Cache stored in: memory/scraper-cache/cache.json
Override data directory:
--dir /path/to/data
Privacy mode (no local storage):
--extract --no-cache https://example.com
{0,N} limits to prevent ReDoSWhen extracting web content:
--extract <url> for a full overview--extract --table/list/price/article for focused extraction--parse when you already have HTML from another tool--status to monitor cache usage| Tool | Structure | Tables | Prices | Articles | Caching |
|---|---|---|---|---|---|
web_fetch | Raw HTML | ❌ | ❌ | ❌ | ❌ |
| Puppeteer | ✅ | ✅ | ✅ | ✅ | ❌ |
| Smart Scraper | ✅ | ✅ | ✅ | ✅ | ✅ |
Smart Scraper gives you structured extraction + caching with zero dependencies.