Panscrapling Web Scraper

v1.0.0

强大的网页抓取技能。基于 Scrapling，自动绕过 Cloudflare/反爬系统。触发词：抓取网页、爬取、scrape、fetch、抓取内容、提取网页、获取页面。使用场景： (1) 抓取被 Cloudflare 保护的网页 (2) 提取页面内容 (3) 网页数据采集 (4) 动态渲染页面抓取自动安装：...

⭐ 0· 63·0 current·0 all-time

by@dashiming

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for dashiming/panscrapling-web-scraper.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "Panscrapling Web Scraper" (dashiming/panscrapling-web-scraper) from ClawHub.
Skill page: https://clawhub.ai/dashiming/panscrapling-web-scraper
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install panscrapling-web-scraper

ClawHub CLI

Package manager switcher

npx clawhub@latest install panscrapling-web-scraper

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The code and instructions align with a browser-driven web scraper that uses Scrapling/Playwright/Patchright and installs Python and browsers. However, the SKILL.md claims fully-embedded/offline operation while the setup.py contains explicit online fallback behavior (install Homebrew via curl, pip from an online index) and Homebrew-only install paths — yet the skill metadata has no OS restriction. The macOS/Homebrew-centric install logic and online fallbacks are inconsistent with the 'offline embedded' claim and the lack of OS limitation.

Instruction Scope

Runtime instructions and scripts will automatically run system-level installation steps: installing Homebrew, installing Python, pip installing packages, and downloading browser binaries. SKILL.md asserts 'no third-party API calls' and 'all requests through local browser', but the installer performs network operations (curl Homebrew install, pip fallback to remote indices, Playwright/patchright downloads). The installer also attempts to modify system state without an explicit interactive consent step in the skill instructions.

Install Mechanism

There is no registry install spec; instead the included setup.py runs subprocesses that execute network installs (curl | bash for Homebrew, pip installs, playwright/patchright browser downloads). Using GitHub raw for Homebrew installation and a Tsinghua PyPI mirror are common but are online network actions executed without integrity checks. The script first prefers local wheels (good) but explicitly falls back to online installation; downloads are executed without signature or checksum verification.

✓

Credentials

The skill requests no environment variables or credentials, which is appropriate for a scraper. However, it does require filesystem and network access to install software and to download browser binaries; that is expected for this functionality but is impactful and should be considered by the user.

ℹ

Persistence & Privilege

The skill is not force-enabled (always:false) and does not request credentials, but its installer will change the host system by installing Homebrew/Python and downloading browsers—persistent system modifications beyond the skill's own files. This is functionally expected for a scraper needing a browser runtime but increases the blast radius and should be noted.

What to consider before installing

Before installing or running this skill: - Expect it to modify your system: setup.py will try to install Homebrew (via curl|bash), install Python 3.11 via Homebrew, pip-install packages (with an online fallback), and download Playwright/Patchright browser binaries. Run only on machines where you accept these changes. - The SKILL.md claims 'embedded wheels' and 'offline install', but the installer falls back to online network installs — verify the wheels/packaged files are present in the provided wheels/ directory if you need offline usage. - There are no integrity checks for downloaded scripts or packages. If you want to proceed, review the included wheels and the full setup.py and fetch.py contents locally, or run the installer in a sandboxed/VM environment. - The installer is macOS/Homebrew-centric; on other OSes it may fail or behave unexpectedly. There is no OS restriction declared. - The skill claims 'automatic bypass of Cloudflare/Turnstile' — that capability can be used for legitimate scraping but also to circumvent protections. Ensure your intended scraping is legal and compliant with target site policies. - If you cannot review code/wheels yourself, avoid installing. Prefer running this inside an isolated environment (container/VM) and inspect network activity during the first run. If you need offline assurance, confirm the wheels/ directory contains the expected whl files and avoid letting the script run its online fallback.

Like a lobster shell, security has layers — review code before you run it.

latestvk971wej5yr0v82nyx8mkszdb0h84z4pn

63downloads

0stars

1versions

Updated 1w ago

v1.0.0

MIT-0

Panscrapling Web Scraper

基于 Scrapling 的网页抓取技能。 自动绕过 Cloudflare Turnstile，无需手动处理验证码。

特点

✅ 完全嵌入分发 - 包含所有 Python 依赖，支持离线安装 ✅ 自动安装 Python - 自动检测并安装 Python 3.10+ ✅ 自动绕过 Cloudflare - 无需手动处理验证码 ✅ 多种抓取模式 - Fast / Stealthy / Dynamic

使用方式

直接说：

"抓取 https://example.com 的内容"
"用 Scrapling 抓取这个页面"
"获取 https://xxx.com 的 .product 元素"

抓取模式

模式	用途	特点
`auto`	自动选择	默认模式，适合大多数情况
`stealthy`	隐身抓取	绕过 Cloudflare，解验证码
`dynamic`	动态渲染	完整浏览器，JS 执行
`fast`	快速 HTTP	轻量级，适合简单页面

CLI 使用

# 基础抓取
python3 scripts/fetch.py "https://example.com"

# 绕过 Cloudflare
python3 scripts/fetch.py "https://protected-site.com" --mode stealthy

# 提取特定元素
python3 scripts/fetch.py "https://shop.com" --selector ".product-title"

# 输出 Markdown
python3 scripts/fetch.py "https://blog.com/article" --markdown

# 提取链接、图片、元数据
python3 scripts/fetch.py "https://example.com" --links --images --meta

# 仅运行安装
python3 scripts/fetch.py --setup

自动安装流程

首次使用时自动执行：

检测 Python 3.10+ - 查找系统已安装的 Python
安装 Python - 如果未找到，通过 Homebrew 自动安装
安装依赖 - 从嵌入的 wheel 包安装 Scrapling
安装浏览器 - 下载 Playwright/Patchright Chromium

与 pansxng-websearch 配合

pansxng-websearch: 搜索引擎查询 → 获取链接列表
panscrapling-web-scraper: 深入抓取页面 → 提取具体内容

典型工作流：

用 pansxng-websearch 搜索关键词
用 panscrapling-web-scraper 抓取目标页面

文件结构

panscrapling-web-scraper/
├── SKILL.md              # 本文档
├── scripts/
│   ├── setup.py          # 自动安装脚本
│   └── fetch.py          # 主抓取脚本
├── wheels/               # 嵌入的 Python 依赖包
│   ├── scrapling-*.whl
│   ├── playwright-*.whl
│   ├── patchright-*.whl
│   └── ... (其他依赖)
└── browsers/             # 浏览器（首次运行时下载）

隐私说明

所有请求通过本地浏览器发出
无第三方 API 调用
不泄露搜索/抓取记录

依赖

Python 3.10+（自动安装）
Homebrew（macOS，自动安装）
Playwright Chromium（自动下载）

Comments

Loading comments...