Korean Scraper

SuspiciousAudited by ClawScan on May 10, 2026.

Overview

This is a functional scraper, but it intentionally evades website bot defenses, weakens browser safety settings, and overstates compliance safeguards.

Install only if you are comfortable running a stealth web scraper. Use it only on sites you are authorized to scrape, verify robots.txt and terms yourself, avoid untrusted URLs, and run it in a sandboxed environment.

Findings (5)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

The agent could be used to scrape sites in ways those sites are actively trying to block, creating legal, terms-of-service, and abuse-risk exposure for the user.

Why it was flagged

The skill explicitly advertises evading automation detection and Cloudflare protections as part of the scraper workflow.

Skill content
- **navigator.webdriver 숨김** — 자동화 탐지 회피
- **Stealth Plugin** — Playwright extra stealth
- **Cloudflare 우회** — 대기 시간 자동 조정
Recommendation

Use only for authorized collection, prefer official APIs where available, remove stealth/bypass defaults, and require explicit user approval for scraping protected sites.

What this means

If the scraper visits a malicious or compromised page, the browser has fewer protections than a normal browser session.

Why it was flagged

The browser launcher weakens sandboxing, same-origin protections, and site isolation for every page the scraper opens.

Skill content
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-web-security',
'--disable-features=IsolateOrigins,site-per-process'
Recommendation

Avoid disabling browser security unless strictly required, validate target domains, and run this skill in an isolated container or disposable environment.

What this means

Users may incorrectly believe the scraper automatically respects site crawling policies.

Why it was flagged

The skill claims robots.txt compliance by default, but the provided source does not include robots.txt fetching or enforcement before page.goto calls.

Skill content
- ✅ robots.txt 준수 (기본값)
Recommendation

Add real robots.txt enforcement or remove this claim; users should manually verify site terms before scraping.

What this means

Users could receive article content with rights notices stripped, making reuse feel safer than it actually is.

Why it was flagged

The scraper removes no-redistribution and copyright notices from article text before returning the extracted content.

Skill content
content = content.replace(/무단\s*전재\s*및?\s*재배포\s*금지/gi, '');
content = content.replace(/ⓒ\s*.*?무단\s*전재.*?\n/gi, '');
Recommendation

Preserve copyright and attribution notices in output, or clearly warn users when content may be copyrighted.

What this means

Installing the skill depends on external npm packages and a browser download, so supply-chain changes could affect what runs locally.

Why it was flagged

Installation can run an npm lifecycle script and download a browser binary, and dependency versions are ranged rather than pinned.

Skill content
"install": "npx playwright install chromium",
"dependencies": {
  "playwright": "^1.41.0",
  "playwright-extra": "^4.3.6",
  "puppeteer-extra-plugin-stealth": "^2.11.2"
}
Recommendation

Pin dependencies with a lockfile, declare the install steps in metadata, and install in an isolated environment.