Huo15 Js Scraper

v1.1.0

JavaScript渲染网站抓取工具。当需要抓取JS渲染的页面(如企微文档、Vue/React SPA)、绕过反爬、或者普通curl/wget/web_fetch无法获取内容的网站时使用此技能。支持Playwright和scrapling双引擎自动切换。

0· 18·0 current·0 all-time
byJob Zhao@zhaobod1
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the included scripts: both scrape.py and wecom_docs_scraper.py implement Playwright/scrapling-based scraping and building a local '企业微信文档' knowledge base. Minor note: the skill's source/homepage are unknown (no external provenance), which reduces external trust but doesn't contradict functionality.
Instruction Scope
SKILL.md instructs only how to run the provided scripts, how to install Playwright/scrapling, and where data is stored. The scripts access only the target site (developer.work.weixin.qq.com), read/write files under the user's home workspace, and do not read unrelated system files or environment variables.
Install Mechanism
No automated install spec is bundled (instruction-only), which is lower risk for hidden installs. However, SKILL.md instructs installing Playwright and scrapling which will download large browser binaries and may require elevated installation flags (--break-system-packages). This is expected but worth noting (disk space, network downloads).
Credentials
The skill declares no required environment variables or credentials and the code does not attempt to read secrets; requested access (network to the documented site and write access to ~/workspace/knowledge-base) is proportional to its purpose.
Persistence & Privilege
The skill is not always-enabled and does not modify other skills or global agent configuration. It writes data only to a subdirectory under the user's home directory (~/workspace/knowledge-base/企业微信文档), which is expected for a local knowledge-base builder.
Assessment
This skill appears to do what it says: render JS pages with Playwright or scrapling and save them as Markdown in ~/workspace/knowledge-base/企业微信文档. Before installing/running: (1) be aware Playwright/scrapling will download browser binaries (large) and may require pip flags; ensure you have disk space and consent to those downloads; (2) running the --all mode will perform many requests and create many files—run only if you want a full scrape and respect the target site's terms/robots/rate limits; (3) the package source/homepage is unknown—treat it as unvetted code and consider running in a sandbox or isolated environment if you lack trust; (4) review the scripts yourself (they are included) and avoid running with elevated privileges; (5) if you need enterprise data protection, confirm that writing to your home directory is acceptable.

Like a lobster shell, security has layers — review code before you run it.

js-scrapervk975v89sgdpydew7b5p3t00wg584pw8slatestvk97a87zgd8q6y4fycm9ng0r8g984qabqplaywrightvk975v89sgdpydew7b5p3t00wg584pw8sweb-scrapingvk975v89sgdpydew7b5p3t00wg584pw8s

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments