Scrapling Fetch Basic

v1.0.0

基础网页抓取工具，支持绕过反爬系统、自动定位正文区域、HTML 转 Markdown。适合抓取博客、新闻、公告等静态页面。

⭐ 0· 82·0 current·0 all-time

by@shuxiangfanclaw

MIT-0

Download zip

LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name/description (web scraping, Cloudflare/stealth, HTML→Markdown) align with the provided script and declared deps (scrapling, html2text, playwright). No unrelated env vars, binaries, or config paths are required.

✓

Instruction Scope

SKILL.md describes running the included Python script; the script only fetches the target URL, extracts content with a set of selectors, converts to Markdown, and prints output or JSON. It does not attempt to read local files, other env vars, or exfiltrate results to external endpoints.

ℹ

Install Mechanism

There is no install spec (instruction-only) and a single Python script is included. Dependencies are listed but not installed automatically; the user environment must install scrapling, html2text, and playwright. Playwright typically requires downloading browser binaries (user should be aware).

✓

Credentials

No credentials, secret environment variables, or config paths are requested. Required libraries are proportional to the functionality (HTML parsing and optional browser automation).

✓

Persistence & Privilege

Skill does not request persistent always:true, does not modify other skills or system configs, and is user-invocable only. The script is executed on demand and does not persist credentials or install itself.

Assessment

This skill appears internally consistent, but take these precautions before using it: - Source verification: the package has no homepage and an unknown owner; inspect the scrapling dependency source (PyPI/GitHub) before installing and prefer installing in an isolated environment (virtualenv/container). - Install notes: playwright will usually download browser binaries when first used (run 'playwright install' or follow its docs). That will add sizeable executables to the host; be prepared for that. - SSRF / network risk: the script fetches arbitrary URLs. If you run it on a server that can access internal resources, an attacker-supplied URL could cause server-side requests to internal endpoints. Only run with trusted URLs or in a network-isolated environment. - Legal/ethical: stealth mode and Cloudflare bypass are intended to evade anti-bot protections — ensure you have the right to scrape targets and comply with terms of service and laws. - Dependency hygiene: install dependencies from official registries or pinned releases, review the 'scrapling' package code because the skill relies on it for network access and stealth behavior. - Runtime safety: run first with --debug and limited targets; consider timeouts/rate limits to avoid unintended heavy load. If you want higher assurance, ask the author for a homepage or source repository and a release provenance (e.g., GitHub repo and PyPI package/version).

Like a lobster shell, security has layers — review code before you run it.

latestvk972s5jgcq5j942ewzwc5c042183r7ck

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Scrapling Fetch Basic

License

Comments