Scrapling Safe

v1.0.0

Scrapling 安全技能 - 网页数据抓取工具 支持 HTTP 请求、隐身抓取、浏览器自动化 智能元素定位,抗反爬虫检测 无需 API 配置,路径输出受限

0· 235·1 current·1 all-time
by中国山东肥城林水科技@linshuikeji

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for linshuikeji/scrapling-safe.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Scrapling Safe" (linshuikeji/scrapling-safe) from ClawHub.
Skill page: https://clawhub.ai/linshuikeji/scrapling-safe
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install scrapling-safe

ClawHub CLI

Package manager switcher

npx clawhub@latest install scrapling-safe
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description, SKILL.md, requirements.txt, and scrapling.py all describe a web-scraping tool. Requested artifacts (a Python package 'scrapling' and browser dependencies) match the stated functionality (HTTP/stealthy/dynamic fetch modes). No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md and the code limit scope to public sites, enforce robots.txt and output-path restrictions (user home), and include rate/timeout controls. The runtime instructions and code do not read unrelated files or environment variables. Note: 'stealthy' and 'solve_cloudflare' modes intentionally attempt to evade anti-bot protections — this is coherent with scraping but increases potential for misuse.
Install Mechanism
There is no platform install spec (instruction-only), which is low platform risk. However, SKILL.md and requirements.txt instruct installing a third-party Python package ('scrapling') and browser dependencies via 'scrapling install'. Installing these external packages/drivers is a typical but non-trivial supply-chain risk; verify the package source and installer behavior before running.
Credentials
The skill requests no environment variables, secrets, or unrelated config paths. The code does not access credentials or global config; output path validation is limited to the user's home directory.
Persistence & Privilege
Skill is not always-enabled, is user-invocable, and does not modify other skills or system-wide agent settings. It does not request persistent privileges beyond normal execution.
Assessment
This skill appears to do what it says (scraping public sites) and doesn't ask for unrelated secrets. Before installing/using it you should: 1) verify the 'scrapling' package on PyPI or its source repository (ensure it's the legitimate project and inspect its install scripts); 2) be cautious when installing browser drivers or helper tools (they may download binaries); 3) run initially in a sandbox or isolated environment; 4) ensure you have legal authorization to scrape a target site and respect robots.txt and terms of service; and 5) if you rely on the output-path restriction, test that the tool cannot be tricked into writing outside your home directory (e.g., via symlinks or path traversal). If you need higher assurance, request the upstream source repository and provenance for the 'scrapling' package and any installer scripts.

Like a lobster shell, security has layers — review code before you run it.

latestvk973c7f0yj7edtze2wba8m9vxs836agq
235downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

Scrapling 技能

使用 Scrapling 框架进行安全的网页数据抓取和内容提取。

安全说明

仅用于合法用途,严格遵守以下限制:

  • 仅抓取公开网站:遵守 robots.txt 和网站服务条款
  • 路径输出受限:结果文件只能保存到用户主目录
  • 无危险函数:不使用 eval/exec 等危险函数
  • 严格的超时控制:防止无限等待
  • 频率限制:自动添加请求延迟,避免对目标造成压力
  • 禁止抓取私有内容:仅抓取公开可访问页面
  • 禁止大规模爬取:默认并发限制为 1

核心功能

  • 🕷️ 多种抓取模式:HTTP/隐身/浏览器自动化
  • 🎯 智能元素定位:自适应选择器,网站改版后仍有效
  • 📄 数据提取:CSS/XPath/文本/正则搜索
  • 💾 结果保存:保存到 JSON/TXT/MD 文件(路径受限)
  • 🔍 内容解析:类似 Scrapy 的 API,易于使用

使用场景

  • 抓取公开新闻和资讯
  • 提取商品信息(电商网站)
  • 收集公开数据(天气、股票等)
  • 网站内容监控

触发词

  • "scrapling 抓取..."
  • "scrapling 提取..."
  • "scrapling 爬取..."
  • "scrapling 获取..."

快速开始

基本抓取

# HTTP 请求抓取
scrapling get 'https://example.com' --output ~/result.json

# 隐身模式抓取
scrapling stealthy 'https://example.com' --output ~/result.json

# 浏览器自动化(动态内容)
scrapling dynamic 'https://example.com' --output ~/result.json

指定选择器

# 使用 CSS 选择器
scrapling get 'https://quotes.toscrape.com' --css-selector '.quote' --output ~/quotes.json

# 提取特定字段
scrapling get 'https://quotes.toscrape.com' --css-selector '.quote .text' --output ~/text.txt

高级用法

# 隐身模式 + 解决 Cloudflare
scrapling stealthy 'https://nopecha.com/demo/cloudflare' --solve-cloudflare --output ~/result.json

# 并发抓取(限制为 1)
scrapling spider 'https://example.com' --concurrent 1 --output ~/crawl.json

安装要求

  • Python 3.10+
  • 需要安装 Scrapling: pip install scrapling[fetchers]
  • 需要浏览器依赖:scrapling install

注意事项

  • 需要本地安装 Scrapling 和相关依赖
  • 默认请求延迟 1 秒,避免对目标造成压力
  • 仅抓取公开可访问的页面
  • 遵守 robots.txt 和网站服务条款
  • 输出文件路径必须在用户主目录

版本历史

  • 1.0.0 - 初始版本

Comments

Loading comments...