Install
openclaw skills install web-scraper-as-a-serviceBuild client-ready web scrapers with clean data output. Use when creating scrapers for clients, extracting data from websites, or delivering scraping projects.
openclaw skills install web-scraper-as-a-serviceTurn scraping briefs into deliverable scraping projects. Generates the scraper, runs it, cleans the data, and packages everything for the client.
/web-scraper-as-a-service "Scrape all products from example-store.com — need name, price, description, images. CSV output."
/web-scraper-as-a-service https://example.com --fields "title,price,rating,url" --format csv
/web-scraper-as-a-service brief.txt
Before writing any code:
requests + BeautifulSoupplaywrightGenerate a complete Python script in scraper/ directory:
scraper/
scrape.py # Main scraper script
requirements.txt # Dependencies
config.json # Target URLs, fields, settings
README.md # Setup and usage instructions for client
scrape.py must include:
# Required features in every scraper:
# 1. Configuration
import json
config = json.load(open('config.json'))
# 2. Rate limiting (ALWAYS — be respectful)
import time
DELAY_BETWEEN_REQUESTS = 2 # seconds, adjustable in config
# 3. Retry logic
MAX_RETRIES = 3
RETRY_DELAY = 5
# 4. User-Agent rotation
USER_AGENTS = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...",
# ... at least 5 user agents
]
# 5. Progress tracking
print(f"Scraping page {current}/{total} — {items_collected} items collected")
# 6. Error handling
# - Log errors but don't crash on individual page failures
# - Save progress incrementally (don't lose data on crash)
# - Write errors to error_log.txt
# 7. Output
# - Save data incrementally (append to file, don't hold in memory)
# - Support CSV and JSON output
# - Clean and normalize data before saving
# 8. Resume capability
# - Track last successfully scraped page/URL
# - Can resume from where it left off if interrupted
After scraping, clean the data:
Data Quality Report
───────────────────
Total records: 2,487
Duplicates removed: 13
Empty fields filled: 0
Fields with issues: price (3 records had non-numeric values — cleaned)
Completeness: 99.5%
Generate a complete deliverable:
delivery/
data.csv # Clean data in requested format
data.json # JSON alternative
data-quality-report.md # Quality metrics
scraper-documentation.md # How the scraper works
README.md # Quick start guide
scraper-documentation.md includes:
Present:
Based on the target type, use the appropriate template:
Fields: name, price, original_price, discount, description, images, category, sku, rating, review_count, availability, url
Fields: address, price, bedrooms, bathrooms, sqft, lot_size, listing_type, agent, description, images, url
Fields: title, company, location, salary, job_type, description, requirements, posted_date, url
Fields: business_name, address, phone, website, category, rating, review_count, hours, description
Fields: title, author, date, content, tags, url, image