Deep Web Fetcher
PassAudited by ClawScan on May 1, 2026.
Overview
The skill matches its stated purpose as a local Playwright-based web content extractor, with no evident hidden exfiltration or destructive behavior, but fetched pages and manual dependency installs should be treated as untrusted.
This skill appears safe for its stated scraping and extraction purpose. Before installing, use a virtual environment, review or pin the Python dependencies, remember that target websites will receive browser requests from your machine, and treat all fetched webpage text or HTML as untrusted content rather than agent instructions.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A malicious or compromised webpage could include text that tries to influence the agent if the agent treats fetched content as instructions instead of data.
The tool loads caller-supplied webpages and returns page-derived HTML/text to the agent. That content is untrusted and may contain prompt-like instructions.
page.goto(url, wait_until="networkidle", timeout=timeout*1000) ... result["content_html"] = doc.summary()
Fetch only intended URLs, treat all returned webpage content as quoted untrusted data, and avoid following instructions embedded in fetched pages.
Installing dependencies from external package sources can introduce risk if the environment or package source is compromised.
The setup instructions rely on manually installing unpinned packages and downloading a browser runtime. This is expected for a Playwright scraper, but it is still external supply-chain exposure.
pip install playwright readability-lxml lxml beautifulsoup4 # 安装浏览器驱动(首次运行需下载~100MB) playwright install chromium
Install in a virtual environment, use trusted package indexes, consider pinning versions, and review dependency provenance before use.
Users may overread the privacy claim and forget that the target site can see the request, including the requested URL and network metadata.
The skill clearly visits target URLs, while the closing privacy wording says data does not leave the machine. In context this appears to mean no paid extraction API is used, but users should understand that target websites still receive requests.
2. 访问目标URL,等待JS渲染完成 ... *完全免费,本地运行,数据不出机器*
Do not assume the skill is offline or anonymous; only fetch pages you are comfortable contacting from your environment.
