Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Playwright Scraper Skill 1.2.0

Playwright-based web scraping OpenClaw Skill with anti-bot protection. Successfully tested on complex sites like Discuss.com.hk.

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 489 · 8 current installs · 9 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the included files and usage: two Playwright scripts (simple + stealth), docs, examples, and npm dependency on Playwright. No credentials or unrelated binaries are requested, which is proportionate for a scraper.
Instruction Scope
SKILL.md and scripts instruct the agent to install dependencies (npm / npx playwright install) and run local JS scrapers. The stealth script modifies navigator properties and injects init scripts to hide automation markers (expected for anti-bot evasion). The docs mention future use of proxies and CAPTCHA solvers (2captcha) but those are not implemented in code. Instructions do not read unrelated system files or exfiltrate data to external servers.
Install Mechanism
No formal install spec in registry; documentation tells users to run npm install and npx playwright install chromium. Dependencies come from the public npm registry (package-lock references registry.npmjs.org). This is expected but means npm will fetch and install packages and browser binaries to disk—run in an environment you control.
Credentials
No required environment variables or credentials are declared. Optional env vars (HEADLESS, WAIT_TIME, SCREENSHOT_PATH, SAVE_HTML, USER_AGENT) are reasonable and directly related to scraper behavior.
Persistence & Privilege
Skill does not request always:true and does not attempt to modify other skills or system-wide configs. It runs on demand and writes only local artifacts (screenshots, saved HTML) when instructed.
Assessment
This skill appears to do what it claims: local Playwright scripts for normal and 'stealth' scraping. Before installing or running it: (1) review the included scripts yourself—npm install will fetch Playwright and its dependencies and npx playwright install will download browser binaries; run these in an isolated environment if you are cautious; (2) be aware the stealth script intentionally modifies browser fingerprints to evade bot detection—this is the feature, but could be legally or ethically questionable depending on target sites; (3) the docs mention proxy rotation and CAPTCHA services (2captcha) as planned — those would require third-party credentials and introduce additional risk if added later; (4) no credentials are required now, and no hidden network endpoints are present, but only run code from unknown sources if you trust the author or after manual audit.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97a2nzaetfwkwwnxrya5yfdgx81nmw7

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Playwright Scraper Skill

A Playwright-based web scraping OpenClaw Skill with anti-bot protection. Choose the best approach based on the target website's anti-bot level.


🎯 Use Case Matrix

Target WebsiteAnti-Bot LevelRecommended MethodScript
Regular SitesLowweb_fetch toolN/A (built-in)
Dynamic SitesMediumPlaywright Simplescripts/playwright-simple.js
Cloudflare ProtectedHighPlaywright Stealthscripts/playwright-stealth.js
YouTubeSpecialdeep-scraperInstall separately
RedditSpecialreddit-scraperInstall separately

📦 Installation

cd playwright-scraper-skill
npm install
npx playwright install chromium

🚀 Quick Start

1️⃣ Simple Sites (No Anti-Bot)

Use OpenClaw's built-in web_fetch tool:

# Invoke directly in OpenClaw
Hey, fetch me the content from https://example.com

2️⃣ Dynamic Sites (Requires JavaScript)

Use Playwright Simple:

node scripts/playwright-simple.js "https://example.com"

Example output:

{
  "url": "https://example.com",
  "title": "Example Domain",
  "content": "...",
  "elapsedSeconds": "3.45"
}

3️⃣ Anti-Bot Protected Sites (Cloudflare etc.)

Use Playwright Stealth:

node scripts/playwright-stealth.js "https://m.discuss.com.hk/#hot"

Features:

  • Hide automation markers (navigator.webdriver = false)
  • Realistic User-Agent (iPhone, Android)
  • Random delays to mimic human behavior
  • Screenshot and HTML saving support

4️⃣ YouTube Video Transcripts

Use deep-scraper (install separately):

# Install deep-scraper skill
npx clawhub install deep-scraper

# Use it
cd skills/deep-scraper
node assets/youtube_handler.js "https://www.youtube.com/watch?v=VIDEO_ID"

📖 Script Descriptions

scripts/playwright-simple.js

  • Use Case: Regular dynamic websites
  • Speed: Fast (3-5 seconds)
  • Anti-Bot: None
  • Output: JSON (title, content, URL)

scripts/playwright-stealth.js

  • Use Case: Sites with Cloudflare or anti-bot protection
  • Speed: Medium (5-20 seconds)
  • Anti-Bot: Medium-High (hides automation, realistic UA)
  • Output: JSON + Screenshot + HTML file
  • Verified: 100% success on Discuss.com.hk

🎓 Best Practices

1. Try web_fetch First

If the site doesn't have dynamic loading, use OpenClaw's web_fetch tool—it's fastest.

2. Need JavaScript? Use Playwright Simple

If you need to wait for JavaScript rendering, use playwright-simple.js.

3. Getting Blocked? Use Stealth

If you encounter 403 or Cloudflare challenges, use playwright-stealth.js.

4. Special Sites Need Specialized Skills

  • YouTube → deep-scraper
  • Reddit → reddit-scraper
  • Twitter → bird skill

🔧 Customization

All scripts support environment variables:

# Set screenshot path
SCREENSHOT_PATH=/path/to/screenshot.png node scripts/playwright-stealth.js URL

# Set wait time (milliseconds)
WAIT_TIME=10000 node scripts/playwright-simple.js URL

# Enable headful mode (show browser)
HEADLESS=false node scripts/playwright-stealth.js URL

# Save HTML
SAVE_HTML=true node scripts/playwright-stealth.js URL

# Custom User-Agent
USER_AGENT="Mozilla/5.0 ..." node scripts/playwright-stealth.js URL

📊 Performance Comparison

MethodSpeedAnti-BotSuccess Rate (Discuss.com.hk)
web_fetch⚡ Fastest❌ None0%
Playwright Simple🚀 Fast⚠️ Low20%
Playwright Stealth⏱️ Medium✅ Medium100%
Puppeteer Stealth⏱️ Medium✅ Medium-High~80%
Crawlee (deep-scraper)🐢 Slow❌ Detected0%
Chaser (Rust)⏱️ Medium❌ Detected0%

🛡️ Anti-Bot Techniques Summary

Lessons learned from our testing:

✅ Effective Anti-Bot Measures

  1. Hide navigator.webdriver — Essential
  2. Realistic User-Agent — Use real devices (iPhone, Android)
  3. Mimic Human Behavior — Random delays, scrolling
  4. Avoid Framework Signatures — Crawlee, Selenium are easily detected
  5. Use addInitScript (Playwright) — Inject before page load

❌ Ineffective Anti-Bot Measures

  1. Only changing User-Agent — Not enough
  2. Using high-level frameworks (Crawlee) — More easily detected
  3. Docker isolation — Doesn't help with Cloudflare

🔍 Troubleshooting

Issue: 403 Forbidden

Solution: Use playwright-stealth.js

Issue: Cloudflare Challenge Page

Solution:

  1. Increase wait time (10-15 seconds)
  2. Try headless: false (headful mode sometimes has higher success rate)
  3. Consider using proxy IPs

Issue: Blank Page

Solution:

  1. Increase waitForTimeout
  2. Use waitUntil: 'networkidle' or 'domcontentloaded'
  3. Check if login is required

📝 Memory & Experience

2026-02-07 Discuss.com.hk Test Conclusions

  • Pure Playwright + Stealth succeeded (5s, 200 OK)
  • ❌ Crawlee (deep-scraper) failed (403)
  • ❌ Chaser (Rust) failed (Cloudflare)
  • ❌ Puppeteer standard failed (403)

Best Solution: Pure Playwright + anti-bot techniques (framework-independent)


🚧 Future Improvements

  • Add proxy IP rotation
  • Implement cookie management (maintain login state)
  • Add CAPTCHA handling (2captcha / Anti-Captcha)
  • Batch scraping (parallel URLs)
  • Integration with OpenClaw's browser tool

📚 References

Files

14 total
Select a file
Select a file to preview.

Comments

Loading comments…