Skylv Web Scraper

v1.0.0

Extract and parse web page content including text, links, and images using CSS selectors and regex for flexible data scraping.

0· 10·1 current·1 all-time
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name and description match the runtime instructions. The only dependency referenced is an internal `web_fetch` tool, which is appropriate for fetching HTML for a scraper. No unrelated binaries, credentials, or config paths are requested.
Instruction Scope
SKILL.md stays within scraping behavior (fetch page, parse HTML, extract links/images/text, iterate URL lists). It explicitly notes obeying robots.txt and rate-limiting. It also mentions handling anti-scraping tactics (User-Agent, cookies, proxies), which is reasonable for robust scraping but could be used to evade protections; this is an operational/ethical consideration rather than an incoherence.
Install Mechanism
Instruction-only skill with no install spec and no code files; nothing is written to disk or downloaded during install, minimizing install-time risk.
Credentials
No environment variables, credentials, or config paths are requested. The skill does not ask for unrelated secrets or system access.
Persistence & Privilege
always is false and the skill is user-invocable. It does not request persistent or elevated platform privileges.
Assessment
This skill appears to do what it says: fetch pages and extract content using an internal fetch tool. Before installing, consider: 1) the skill's source/homepage is unknown — verify you trust the publisher if provenance matters; 2) scraping can have legal and ethical limits (respect robots.txt, site terms, and copyright); 3) the instructions mention User-Agent, cookies, and proxies — avoid using the skill to bypass access controls or to scrape private/login-required data unless you have explicit permission; and 4) if you plan large-scale crawling, ensure you rate-limit and use appropriate infrastructure (and be mindful of potential IP blocking or service abuse). If you need stricter guarantees, request the skill author/publisher info or a published source repository before use.

Like a lobster shell, security has layers — review code before you run it.

latestvk971011b8bd29yz0g1hkzrmqv5858ppr
10downloads
0stars
1versions
Updated 4h ago
v1.0.0
MIT-0

Web Scraper — 网页内容抓取工具

功能说明

从网页抓取并解析内容,支持多种提取方式。

使用方法

1. 抓取网页全文

用户: 抓取 https://example.com 的内容

执行步骤:

  1. 使用 web_fetch 工具抓取URL
  2. 返回markdown格式的正文内容

2. 提取特定元素

用户: 从 https://news.ycombinator.com 提取所有新闻标题

执行步骤:

  1. 使用 web_fetch 抓取页面
  2. 分析HTML结构,识别标题元素
  3. 提取并列表返回

3. 批量抓取

用户: 抓取以下URL列表的内容:
https://url1.com
https://url2.com
https://url3.com

执行步骤:

  1. 遍历URL列表
  2. 依次调用 web_fetch
  3. 汇总结果

4. 提取链接

用户: 提取 https://example.com 页面中的所有外链

执行步骤:

  1. 抓取页面内容
  2. 解析所有 <a href> 标签
  3. 过滤出外链(域名不同的链接)
  4. 列表返回

示例对话

用户: 抓取 https://github.com/trending 今天的热门项目

Agent:

  1. 调用 web_fetch 抓取 GitHub Trending 页面
  2. 解析项目列表(仓库名、描述、star数)
  3. 格式化输出:
今日 GitHub 热门项目:

1. owner/repo-name - 项目描述
   ⭐ 1,234 stars today | 📝 JavaScript

2. ...

注意事项

  • 遵守 robots.txt
  • 添加适当延迟避免被封
  • 处理反爬机制(User-Agent、Cookie等)
  • 大规模抓取建议使用代理

依赖

  • web_fetch 工具(OpenClaw内置)
  • 无需额外安装

Comments

Loading comments...