Korean Scraper
SuspiciousAudited by ClawScan on May 10, 2026.
Overview
This is a functional scraper, but it intentionally evades website bot defenses, weakens browser safety settings, and overstates compliance safeguards.
Install only if you are comfortable running a stealth web scraper. Use it only on sites you are authorized to scrape, verify robots.txt and terms yourself, avoid untrusted URLs, and run it in a sandboxed environment.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
The agent could be used to scrape sites in ways those sites are actively trying to block, creating legal, terms-of-service, and abuse-risk exposure for the user.
The skill explicitly advertises evading automation detection and Cloudflare protections as part of the scraper workflow.
- **navigator.webdriver 숨김** — 자동화 탐지 회피 - **Stealth Plugin** — Playwright extra stealth - **Cloudflare 우회** — 대기 시간 자동 조정
Use only for authorized collection, prefer official APIs where available, remove stealth/bypass defaults, and require explicit user approval for scraping protected sites.
If the scraper visits a malicious or compromised page, the browser has fewer protections than a normal browser session.
The browser launcher weakens sandboxing, same-origin protections, and site isolation for every page the scraper opens.
'--no-sandbox', '--disable-setuid-sandbox', '--disable-web-security', '--disable-features=IsolateOrigins,site-per-process'
Avoid disabling browser security unless strictly required, validate target domains, and run this skill in an isolated container or disposable environment.
Users may incorrectly believe the scraper automatically respects site crawling policies.
The skill claims robots.txt compliance by default, but the provided source does not include robots.txt fetching or enforcement before page.goto calls.
- ✅ robots.txt 준수 (기본값)
Add real robots.txt enforcement or remove this claim; users should manually verify site terms before scraping.
Users could receive article content with rights notices stripped, making reuse feel safer than it actually is.
The scraper removes no-redistribution and copyright notices from article text before returning the extracted content.
content = content.replace(/무단\s*전재\s*및?\s*재배포\s*금지/gi, ''); content = content.replace(/ⓒ\s*.*?무단\s*전재.*?\n/gi, '');
Preserve copyright and attribution notices in output, or clearly warn users when content may be copyrighted.
Installing the skill depends on external npm packages and a browser download, so supply-chain changes could affect what runs locally.
Installation can run an npm lifecycle script and download a browser binary, and dependency versions are ranged rather than pinned.
"install": "npx playwright install chromium",
"dependencies": {
"playwright": "^1.41.0",
"playwright-extra": "^4.3.6",
"puppeteer-extra-plugin-stealth": "^2.11.2"
}Pin dependencies with a lockfile, declare the install steps in metadata, and install in an isolated environment.
