Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

web-data-extractor(网页数据采集器,支持 CSS/XPath 选择器、批量抓取、自动分页、数据导出(CSV/JSON)。 适用于市场调研、竞品分析、内容聚合。)

网页数据采集器,支持 CSS 选择器/XPath 提取、批量抓取、自动分页、数据导出(CSV/JSON/Markdown)。

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 27 · 0 current installs · 0 all-time installs
bycareytian@careytian-ai
MIT-0
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name/description (web scraping and export) align with requiring a fetch tool and write capability to save CSV/JSON. However config.json lists an additional 'exec' capability that is not declared in the SKILL.md's required bins and is not explained — running arbitrary commands via exec is more powerful than the described feature set and creates a mismatch.
!
Instruction Scope
SKILL.md uses high-level helper calls (web_fetch, extractData, exportToCSV) and expects 'read'/'write' tooling. The doc is vague about what files the 'read' tool can access or whether the skill will read arbitrary user files or agent state. The examples do not define these helpers (this is an instruction-only skill), giving the agent broad discretion to call local read/write/exec primitives in ways that aren't explicitly limited by the instructions.
Install Mechanism
No install spec and no code files — instruction-only — so nothing is being downloaded or written to disk by the skill itself. This is the lowest install risk.
Credentials
The skill requests no environment variables or external credentials (good). But it does require 'read' and 'write' binaries which, depending on the platform, can allow access to any file the agent can read/write. Exporting data justifies write access, but read access is not clearly justified beyond potentially reading input URL lists; that could be overbroad.
Persistence & Privilege
always is false and there's no indication the skill requests persistent global privileges or modifies other skills. Autonomous invocation is allowed (platform default) but not combined with other strong red flags here.
What to consider before installing
This skill appears to do what it says (web scraping and export) but has a few small mismatches you should verify before installing. Ask the author for source code or a homepage so you can confirm what the helper tools (web_fetch, read, write) actually do. Specifically: 1) Confirm why config.json lists an 'exec' capability and whether the skill will run shell commands. 2) Confirm the scope of the 'read' tool (which directories/files it can access). 3) If you proceed, run it in a sandboxed environment and do not supply any credentials. 4) Ensure its crawling respects robots.txt and target terms of service. If the author cannot provide a clear explanation or source, treat the skill as higher risk and avoid granting broad read/exec access.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97bdkm6pew8ye7qvdps8k451583tqjp

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Binsweb_fetch, read, write

SKILL.md

网页数据采集器 v1.0.0

从网页批量提取结构化数据,支持多种选择器和导出格式。

功能特性

1. CSS 选择器提取

// 提取所有标题
web_fetch({"url": "https://example.com"})
// 使用 CSS 选择器提取特定元素

2. XPath 提取

// 支持 XPath 路径提取复杂结构

3. 批量抓取

  • 自动分页处理
  • URL 列表批量处理
  • 并发控制

4. 数据导出

  • CSV 格式
  • JSON 格式
  • Markdown 表格

快速使用示例

// 提取文章列表
const articles = extractData({
  url: "https://blog.example.com",
  selector: ".article-card",
  fields: {
    title: "h2.title",
    link: "a[href]",
    date: ".publish-date"
  }
})

// 导出为 CSV
exportToCSV(articles, "output.csv")

// 导出为 JSON
exportToJSON(articles, "output.json")

// 批量抓取多页
const allData = scrapeMultiple({
  baseUrl: "https://example.com/page/",
  pages: 10,
  selector: ".item"
})

使用场景

  1. 市场调研 - 抓取竞品价格、产品信息
  2. 内容聚合 - 收集多源内容
  3. 数据分析 - 提取公开数据集
  4. 舆情监控 - 追踪 mentions、评论
  5. SEO 分析 - 抓取关键词排名

注意事项

  • 遵守目标网站的 robots.txt
  • 控制抓取频率,避免被封
  • 仅抓取公开数据

定制开发

需要定制化数据采集、清洗或自动化工作流?

📧 联系:careytian-ai@github


许可证

MIT-0

Files

4 total
Select a file
Select a file to preview.

Comments

Loading comments…