web-data-extractor（网页数据采集器，支持 CSS/XPath 选择器、批量抓取、自动分页、数据导出（CSV/JSON）。适用于市场调研、竞品分析、内容聚合。）

v1.0.0

网页数据采集器，支持 CSS 选择器/XPath 提取、批量抓取、自动分页、数据导出（CSV/JSON/Markdown）。

⭐ 0· 171·1 current·1 all-time

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for careytian-ai/web-data-extractor.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "web-data-extractor（网页数据采集器，支持 CSS/XPath 选择器、批量抓取、自动分页、数据导出（CSV/JSON）。 适用于市场调研、竞品分析、内容聚合。）" (careytian-ai/web-data-extractor) from ClawHub.
Skill page: https://clawhub.ai/careytian-ai/web-data-extractor
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: web_fetch, read, write
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install web-data-extractor

ClawHub CLI

Package manager switcher

npx clawhub@latest install web-data-extractor

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description (web scraping and export) align with requiring a fetch tool and write capability to save CSV/JSON. However config.json lists an additional 'exec' capability that is not declared in the SKILL.md's required bins and is not explained — running arbitrary commands via exec is more powerful than the described feature set and creates a mismatch.

Instruction Scope

SKILL.md uses high-level helper calls (web_fetch, extractData, exportToCSV) and expects 'read'/'write' tooling. The doc is vague about what files the 'read' tool can access or whether the skill will read arbitrary user files or agent state. The examples do not define these helpers (this is an instruction-only skill), giving the agent broad discretion to call local read/write/exec primitives in ways that aren't explicitly limited by the instructions.

✓

Install Mechanism

No install spec and no code files — instruction-only — so nothing is being downloaded or written to disk by the skill itself. This is the lowest install risk.

ℹ

Credentials

The skill requests no environment variables or external credentials (good). But it does require 'read' and 'write' binaries which, depending on the platform, can allow access to any file the agent can read/write. Exporting data justifies write access, but read access is not clearly justified beyond potentially reading input URL lists; that could be overbroad.

✓

Persistence & Privilege

always is false and there's no indication the skill requests persistent global privileges or modifies other skills. Autonomous invocation is allowed (platform default) but not combined with other strong red flags here.

What to consider before installing

This skill appears to do what it says (web scraping and export) but has a few small mismatches you should verify before installing. Ask the author for source code or a homepage so you can confirm what the helper tools (web_fetch, read, write) actually do. Specifically: 1) Confirm why config.json lists an 'exec' capability and whether the skill will run shell commands. 2) Confirm the scope of the 'read' tool (which directories/files it can access). 3) If you proceed, run it in a sandboxed environment and do not supply any credentials. 4) Ensure its crawling respects robots.txt and target terms of service. If the author cannot provide a clear explanation or source, treat the skill as higher risk and avoid granting broad read/exec access.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

Binsweb_fetch, read, write

latestvk97bdkm6pew8ye7qvdps8k451583tqjp

171downloads

0stars

1versions

Updated 4w ago

v1.0.0

MIT-0

网页数据采集器 v1.0.0

从网页批量提取结构化数据，支持多种选择器和导出格式。

功能特性

1. CSS 选择器提取

// 提取所有标题
web_fetch({"url": "https://example.com"})
// 使用 CSS 选择器提取特定元素

2. XPath 提取

// 支持 XPath 路径提取复杂结构

3. 批量抓取

自动分页处理
URL 列表批量处理
并发控制

4. 数据导出

CSV 格式
JSON 格式
Markdown 表格

快速使用示例

// 提取文章列表
const articles = extractData({
  url: "https://blog.example.com",
  selector: ".article-card",
  fields: {
    title: "h2.title",
    link: "a[href]",
    date: ".publish-date"
  }
})

// 导出为 CSV
exportToCSV(articles, "output.csv")

// 导出为 JSON
exportToJSON(articles, "output.json")

// 批量抓取多页
const allData = scrapeMultiple({
  baseUrl: "https://example.com/page/",
  pages: 10,
  selector: ".item"
})

使用场景

市场调研 - 抓取竞品价格、产品信息
内容聚合 - 收集多源内容
数据分析 - 提取公开数据集
舆情监控 - 追踪 mentions、评论
SEO 分析 - 抓取关键词排名

注意事项

遵守目标网站的 robots.txt
控制抓取频率，避免被封
仅抓取公开数据

定制开发

需要定制化数据采集、清洗或自动化工作流？

📧 联系：careytian-ai@github

许可证