ecommerce-market-analyzer-skill

v1.0.0

Scrape e-commerce homepages from multiple websites in a target market, handle popups automatically, capture screenshots and HTML, extract product data, and g...

0· 189·1 current·1 all-time
bynepp@nepp-an

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for nepp-an/ecommerce-market-analyzer-skill.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "ecommerce-market-analyzer-skill" (nepp-an/ecommerce-market-analyzer-skill) from ClawHub.
Skill page: https://clawhub.ai/nepp-an/ecommerce-market-analyzer-skill
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install ecommerce-market-analyzer-skill

ClawHub CLI

Package manager switcher

npx clawhub@latest install ecommerce-market-analyzer-skill
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (e‑commerce market scraping and analysis) aligns with the included artifacts: SKILL.md, reference pattern files, report template, and a Playwright scraper script that captures screenshots and HTML and contains parsing patterns for targeted e‑commerce sites. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md instructs the agent to run the included Playwright scraper, save screenshots and HTML, then analyze files locally (image reading, grep, regex parsing). All of that is within the stated workflow. Note: the instructions tell the agent to 'immediately run the scraper' when given a website list — this will cause network crawling and file writes; users should be aware the skill performs autonomous web requests and local I/O.
Install Mechanism
There is no external install spec or remote download; the skill is instruction + bundled script. Dependencies (Playwright, Python) are documented in README. No suspicious download URLs or archive extraction steps are present.
Credentials
The skill requests no environment variables or credentials. The script operates with local filesystem writes only (screenshots_clean/*.png and .html). No evidence it accesses or requests unrelated secrets or system configs.
Persistence & Privilege
Skill flags are default (always:false) and do not request permanent elevated privileges. It does write files to a local output directory (expected for scraping) but does not modify other skills or global agent configuration.
Assessment
The skill appears coherent with its stated purpose, but take these precautions before running: 1) Review the included scripts manually—only run code you trust. 2) Run the scraper in an isolated environment (VM/container) to limit risk and disk usage. 3) Install Playwright and dependencies according to the README in a controlled venv. 4) Respect robots.txt and target sites' Terms of Service; avoid scraping behind authentication or paid/content-restricted areas. 5) Start with a small site list and low request rates (the script already has delays) to reduce anti-bot triggers. 6) Be aware the tool saves full HTML/screenshots locally—don’t include sites containing sensitive personal data. If you want higher assurance, ask the maintainer for provenance (homepage, repo) or run the code through your static analysis toolchain before use.

Like a lobster shell, security has layers — review code before you run it.

latestvk9767dxept6szevt4m86yxdzp1837mgg
189downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

E-commerce Market Analyzer

Automated workflow for scraping e-commerce websites, handling popups, extracting product data, and generating comprehensive market analysis reports.

Workflow Overview

This skill follows a 4-step workflow:

  1. Setup & Scraping - Run Playwright scraper to capture homepages
  2. Visual Analysis - Analyze screenshots to identify product categories
  3. Data Extraction - Parse HTML to extract specific products and prices
  4. Report Generation - Create comprehensive market analysis report
User provides website list
         ↓
Step 1: Run scraper (handles popups automatically)
         ↓
Step 2: Analyze screenshots visually
         ↓
Step 3: Extract structured data from HTML
         ↓
Step 4: Generate final report

Step 1: Setup & Scraping

Quick Start

When user provides a list of e-commerce websites, immediately run the scraper:

# Create output directory
mkdir -p screenshots_clean

# Run the scraper
uv run python scripts/scrape_websites.py

Customizing the Website List

Edit scripts/scrape_websites.py and update the WEBSITES list:

WEBSITES = [
    "amazon.de",
    "ebay.de",
    "otto.de",
    # Add more websites...
]

Key Features

The scraper automatically:

  • Handles cookie consent popups (German, English, universal selectors)
  • Handles region/language selection dialogs
  • Captures full-page screenshots (1920x1080)
  • Saves HTML source code
  • Uses German locale settings (or customize for other markets)
  • Waits for page stabilization

Important: The script uses popup patterns from references/popup_patterns.md. Consult this file if dealing with new popup types.

Expected Output

After running, you'll have:

  • screenshots_clean/*.png - Full-page screenshots
  • screenshots_clean/*.html - HTML source files
  • Console output with success/failure summary

Success rate target: 85-95%

Common failures:

  • Anti-bot protection (requires manual intervention)
  • HTTP/2 protocol errors (some sites block automation)
  • Timeout on slow-loading sites

Step 2: Visual Analysis

Read Screenshots

After scraping, read the screenshot files to visually identify:

  • Product categories
  • Featured products
  • Promotional items
  • Visual design patterns

Example approach:

from pathlib import Path

screenshot_dir = Path("screenshots_clean")
screenshots = list(screenshot_dir.glob("*.png"))

# Read screenshots using the Read tool
for screenshot in screenshots[:5]:  # Start with 5 sites
    # Use Read tool to view image
    # Note product categories and featured items

What to Look For

Product Categories:

  • Clothing & Fashion (Bekleidung)
  • Electronics (Elektronik)
  • Home & Furniture (Möbel & Wohnen)
  • Food & Groceries (Lebensmittel)
  • Books & Media (Bücher)
  • Beauty & Personal Care (Beauty & Pflege)
  • Sports & Outdoor (Sport)
  • Toys & Baby (Spielzeug & Baby)

Featured Products:

  • Homepage banners
  • Promotional sections
  • "Deal of the day" items
  • New arrivals

Take notes on recurring patterns across multiple sites - these indicate market trends.


Step 3: Data Extraction

Strategy Selection

Choose extraction strategy based on site structure. See references/html_parsing_patterns.md for complete patterns.

Quick decision tree:

  1. Try JSON-LD schema extraction (best for structured data)
  2. Fall back to data attribute extraction
  3. Fall back to class-based extraction
  4. Last resort: keyword matching

Example: Extract from REWE.de

import re
from pathlib import Path

html_file = Path("screenshots_clean/rewe.de.html")
content = html_file.read_text(encoding='utf-8')

# REWE-specific patterns
title_pattern = r'data-offer-title="([^"]+)"'
price_pattern = r'<div class="cor-offer-price__tag-price">([^<]+)</div>'

titles = re.findall(title_pattern, content)
prices = re.findall(price_pattern, content)

for i, title in enumerate(titles[:10]):
    price = prices[i] if i < len(prices) else "N/A"
    print(f"{title}: {price}€")

Platform-Specific Parsing

Each e-commerce platform has unique HTML structure. Consult references/html_parsing_patterns.md for:

  • Amazon.de patterns
  • eBay.de patterns
  • Otto.de patterns
  • Zalando/AboutYou patterns
  • REWE/Lidl supermarket patterns
  • And more...

Price Normalization

Always normalize prices:

def normalize_price(price_str):
    """Convert German format (1.234,56€) to float"""
    price_str = price_str.replace('€', '').replace('EUR', '').strip()
    if ',' in price_str and '.' in price_str:
        price_str = price_str.replace('.', '').replace(',', '.')
    elif ',' in price_str:
        price_str = price_str.replace(',', '.')
    try:
        return float(price_str)
    except:
        return None

Handling Large Files

For HTML files >25k tokens:

# Use grep to search for specific patterns
grep -o 'data-product-name="[^"]*"' amazon.de.html | head -20

# Or extract specific sections
grep -A 5 'product-title' ebay.de.html

Extraction Best Practices

  1. Try multiple patterns - Start with JSON-LD, fall back as needed
  2. Validate extractions - Check for reasonable length (10-100 chars)
  3. Remove duplicates - Use sets to track seen products
  4. Limit results - Cap at 10-20 products per site
  5. Handle encoding - Always use encoding='utf-8'

Step 4: Report Generation

Use the Report Template

Copy and customize assets/report_template.md:

cp assets/report_template.md final_report.md

Report Structure

The template includes these sections:

  1. Executive Summary - Key findings
  2. Top Product Categories - Ranked list with percentages
  3. Verified Product Prices - Extracted data with exact prices
  4. Platform-Specific Analysis - Per-site breakdown
  5. Market Trends - Growth trends and consumer behavior
  6. Seasonal Characteristics - Current and predicted
  7. Technical Implementation - Success metrics and limitations
  8. Business Insights - Opportunities and recommendations
  9. Data Sources - Success/failure breakdown
  10. Conclusions - Actionable takeaways

Filling the Template

Replace placeholder tokens:

  • {MARKET} → German, UK, US, etc.
  • {NUM_SITES} → 23, 25, etc.
  • {DATE} → 2026-03-19
  • {SUCCESS_RATE} → 92
  • {CATEGORY_1} → Clothing & Fashion
  • {PERCENTAGE_1} → 28
  • And so on...

Data Quality Indicators

Include these metrics:

  • Success rate: % of successfully scraped sites
  • Popup handling: # of sites with popups handled
  • Price accuracy: % of verified prices
  • Screenshot quality: Resolution and file size
  • HTML completeness: Average file size

Writing Tips

Be bilingual (for German market):

  • Product names: German + Chinese/English translation
  • Categories: "Bekleidung / Clothing"
  • Maintain both languages throughout

Be specific:

  • ❌ "Electronics are popular"
  • ✅ "AirPods 4 (89,90€ on eBay), PlayStation 5, and Samsung smartphones are top electronics"

Include evidence:

  • Reference screenshot file names
  • Quote exact prices with sources
  • Link specific platforms to products

Troubleshooting

Issue: Popup Not Closed

Solution: Check references/popup_patterns.md for the specific site. Add custom selector if needed:

# In scripts/scrape_websites.py, add to popup_selectors list:
popup_selectors = [
    # ... existing selectors ...
    'button:has-text("Neue Popup Text")',  # Add custom
]

Issue: HTML Parsing Returns Empty

Diagnose:

  1. Check if HTML file exists and has content
  2. Verify the pattern with grep: grep -o "your-pattern" file.html
  3. Try alternative patterns from references/html_parsing_patterns.md
  4. Use keyword matching as fallback

Issue: Anti-Bot Detection

Symptoms: CAPTCHA, "Verify you are human", IP blocking

Solutions:

  1. Add delays between requests (already in script)
  2. Customize user agent string
  3. Use browser fingerprinting evasion
  4. For production: consider proxy rotation (not included)

Issue: Timeout Errors

Solution: Adjust timeout in script:

await page.goto(url, wait_until="domcontentloaded", timeout=120000)  # 2min

Or use more relaxed loading strategy:

await page.goto(url, wait_until="load", timeout=90000)

Market-Specific Configuration

German Market (Default)

context = await browser.new_context(
    locale="de-DE",
    timezone_id="Europe/Berlin",
    user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)..."
)

Popup patterns: See references/popup_patterns.md → German Market section

UK Market

context = await browser.new_context(
    locale="en-GB",
    timezone_id="Europe/London",
)

Popup patterns: Use English/International selectors

US Market

context = await browser.new_context(
    locale="en-US",
    timezone_id="America/New_York",
)

Other Markets

Adjust locale and timezone_id accordingly. Update popup selectors in script based on language.


Advanced Usage

Parallel Scraping

For large website lists, modify script to use concurrent scraping:

import asyncio

async def scrape_all(websites):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        tasks = [capture_homepage(browser, url, output_dir) for url in websites]
        results = await asyncio.gather(*tasks)
        await browser.close()
    return results

Note: Be respectful of rate limits. Use delays.

Custom Analysis

Beyond the standard workflow, you can:

  • Compare prices across platforms
  • Track price changes over time (run periodically)
  • Identify pricing patterns (premium vs discount)
  • Analyze promotional strategies
  • Monitor competitor activity

Exporting Data

Consider exporting to structured formats:

  • CSV: For spreadsheet analysis
  • JSON: For programmatic access
  • Database: For long-term tracking

Example CSV export:

import csv

with open('products.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['Platform', 'Product', 'Price', 'Category'])
    for product in products:
        writer.writerow([product['platform'], product['name'],
                        product['price'], product['category']])

Best Practices

Ethical Scraping

  1. Respect robots.txt - Check before scraping
  2. Rate limiting - Don't overwhelm servers (script includes delays)
  3. Terms of Service - Review site ToS
  4. Personal use - This skill is for market research, not commercial resale

Data Quality

  1. Verify prices - Cross-check suspicious values
  2. Update regularly - E-commerce changes fast
  3. Document assumptions - Note any manual adjustments
  4. Keep raw data - Save screenshots and HTML for reference

Report Quality

  1. Be objective - Base conclusions on data
  2. Show your work - Reference sources
  3. Contextualize - Explain market-specific factors
  4. Actionable - Provide specific recommendations

Resources Reference

scripts/scrape_websites.py

Main scraper with automatic popup handling. Uses Playwright to capture homepages.

Usage: uv run python scripts/scrape_websites.py

references/popup_patterns.md

Comprehensive collection of popup selectors for different markets and platforms.

When to read: When encountering new popup types or troubleshooting popup handling.

references/html_parsing_patterns.md

Platform-specific HTML parsing patterns and extraction strategies.

When to read: When extracting product data from HTML files. Contains patterns for Amazon, eBay, REWE, Otto, Zalando, and generic strategies.

assets/report_template.md

Structured template for the final market analysis report.

Usage: Copy and fill in with analysis results.

Comments

Loading comments...