Scrapling Fetch Basic

v1.0.0

基础网页抓取工具,支持绕过反爬系统、自动定位正文区域、HTML 转 Markdown。适合抓取博客、新闻、公告等静态页面。

0· 82·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description (web scraping, Cloudflare/stealth, HTML→Markdown) align with the provided script and declared deps (scrapling, html2text, playwright). No unrelated env vars, binaries, or config paths are required.
Instruction Scope
SKILL.md describes running the included Python script; the script only fetches the target URL, extracts content with a set of selectors, converts to Markdown, and prints output or JSON. It does not attempt to read local files, other env vars, or exfiltrate results to external endpoints.
Install Mechanism
There is no install spec (instruction-only) and a single Python script is included. Dependencies are listed but not installed automatically; the user environment must install scrapling, html2text, and playwright. Playwright typically requires downloading browser binaries (user should be aware).
Credentials
No credentials, secret environment variables, or config paths are requested. Required libraries are proportional to the functionality (HTML parsing and optional browser automation).
Persistence & Privilege
Skill does not request persistent always:true, does not modify other skills or system configs, and is user-invocable only. The script is executed on demand and does not persist credentials or install itself.
Assessment
This skill appears internally consistent, but take these precautions before using it: - Source verification: the package has no homepage and an unknown owner; inspect the scrapling dependency source (PyPI/GitHub) before installing and prefer installing in an isolated environment (virtualenv/container). - Install notes: playwright will usually download browser binaries when first used (run 'playwright install' or follow its docs). That will add sizeable executables to the host; be prepared for that. - SSRF / network risk: the script fetches arbitrary URLs. If you run it on a server that can access internal resources, an attacker-supplied URL could cause server-side requests to internal endpoints. Only run with trusted URLs or in a network-isolated environment. - Legal/ethical: stealth mode and Cloudflare bypass are intended to evade anti-bot protections — ensure you have the right to scrape targets and comply with terms of service and laws. - Dependency hygiene: install dependencies from official registries or pinned releases, review the 'scrapling' package code because the skill relies on it for network access and stealth behavior. - Runtime safety: run first with --debug and limited targets; consider timeouts/rate limits to avoid unintended heavy load. If you want higher assurance, ask the author for a homepage or source repository and a release provenance (e.g., GitHub repo and PyPI package/version).

Like a lobster shell, security has layers — review code before you run it.

latestvk972s5jgcq5j942ewzwc5c042183r7ck

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments