Install
openclaw skills install the-time-masheenClawHub Security found sensitive or high-impact capabilities. Review the scan results before using.
Combine live scraping, historical snapshots via Wayback Machine, and interactive browser automation to extract, compare, and automate web data from any site.
openclaw skills install the-time-masheenStep in. Go back. Scrape the dead. Automate the living.
Three-layer web intelligence stack. Pick the right layer — or combine all three.
| Layer | Tool | Job |
|---|---|---|
| Live | Scrapling | Extract content from any live URL |
| Historical | Wayback Machine CDX API | Travel back in time to any archived snapshot |
| Interactive | playwright-cli | Drive a real browser — login, click, scroll, fill forms |
Need web data?
│
├─ Historical / "what did it look like before"?
│ └─ Wayback CDX API → scrape snapshot via Scrapling or web_fetch
│
├─ Need to click / log in / fill forms first?
│ └─ playwright-cli → authenticate → hand off to Scrapling
│
└─ Just current content?
├─ Static / simple → scrapling get
├─ JS-heavy / React → scrapling fetch --network-idle
└─ Heavily protected sites → scrapling stealthy-fetch --solve-cloudflare
# 1. Static sites, blogs, docs
scrapling extract get "https://example.com" output.md
# 2. JS-heavy / React / Next.js / dynamic content
scrapling extract fetch "https://example.com" output.md --network-idle --wait 3000
# 3. Cloudflare / rendering-protected
scrapling extract stealthy-fetch "https://example.com" output.md --solve-cloudflare
scrapling extract fetch "https://example.com" output.md --css-selector "main article"
scrapling extract get "https://example.com" output.md --css-selector ".pricing-table"
Rules:
.md output for readable text, .html only for structure parsing--css-selector to avoid giant HTML blobsSee references/scrapling.md for full CLI flags, spider framework, and Python API.
curl -s "https://web.archive.org/cdx/search/cdx?url=example.com&output=json&fl=timestamp,statuscode&filter=statuscode:200&limit=20"
curl -s "https://web.archive.org/cdx/search/cdx?url=example.com&output=json&collapse=timestamp:4&fl=timestamp,statuscode&filter=statuscode:200"
# Scrapling for clean extraction:
scrapling extract get "https://web.archive.org/web/20230601000000/https://example.com/" archive.md
# Or read via web_fetch:
# web_fetch: https://web.archive.org/web/20230601000000/https://example.com/
curl -s "https://archive.org/wayback/available?url=example.com" | python3 -m json.tool
See references/wayback.md for full CDX API reference and ia CLI usage.
Use when the page requires login, clicking, or dynamic interaction before content is accessible.
# Open browser
playwright-cli open https://app.example.com
# Snapshot to get element refs
playwright-cli snapshot
# Interact
playwright-cli click e12
playwright-cli fill e5 "username@example.com"
playwright-cli press Tab
playwright-cli fill e6 "password"
playwright-cli press Enter
# Capture state
playwright-cli screenshot
playwright-cli eval "document.title"
# Close
playwright-cli close
# 1. playwright-cli open → log in → navigate to target
# 2. playwright-cli screenshot # verify you're authenticated
# 3. scrapling extract get <url> output.md # scrape while session is active
# 1. Scrape current state
scrapling extract get "https://competitor.com/pricing" current.md
# 2. Find yearly snapshots
curl -s "https://web.archive.org/cdx/search/cdx?url=competitor.com/pricing&output=json&collapse=timestamp:4&fl=timestamp&filter=statuscode:200"
# 3. Scrape archived version from any year
scrapling extract get "https://web.archive.org/web/20230101000000/https://competitor.com/pricing" archive.md
# 4. Diff
diff archive.md current.md
# playwright handles auth → Scrapling does the bulk lift
playwright-cli open https://example.com/login
playwright-cli fill e5 "your@email.com"
playwright-cli fill e6 "password"
playwright-cli press Enter
playwright-cli screenshot # verify you're in
scrapling extract get "https://example.com/members/content" output.md
This skill opens real browser sessions and can scrape login-protected pages. A few things to understand before using it:
When you need a full CLI harness for any desktop or web application:
# Install once in Claude Code
/plugin marketplace add HKUDS/CLI-Anything
/plugin install cli-anything
# Build a complete CLI for any software (7-phase pipeline)
/cli-anything:cli-anything ./target-app
# Iteratively refine
/cli-anything:refine ./target-app "focus on data export workflows"