Skylv Web Scraper
v1.0.0Extract and parse web page content including text, links, and images using CSS selectors and regex for flexible data scraping.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name and description match the runtime instructions. The only dependency referenced is an internal `web_fetch` tool, which is appropriate for fetching HTML for a scraper. No unrelated binaries, credentials, or config paths are requested.
Instruction Scope
SKILL.md stays within scraping behavior (fetch page, parse HTML, extract links/images/text, iterate URL lists). It explicitly notes obeying robots.txt and rate-limiting. It also mentions handling anti-scraping tactics (User-Agent, cookies, proxies), which is reasonable for robust scraping but could be used to evade protections; this is an operational/ethical consideration rather than an incoherence.
Install Mechanism
Instruction-only skill with no install spec and no code files; nothing is written to disk or downloaded during install, minimizing install-time risk.
Credentials
No environment variables, credentials, or config paths are requested. The skill does not ask for unrelated secrets or system access.
Persistence & Privilege
always is false and the skill is user-invocable. It does not request persistent or elevated platform privileges.
Assessment
This skill appears to do what it says: fetch pages and extract content using an internal fetch tool. Before installing, consider: 1) the skill's source/homepage is unknown — verify you trust the publisher if provenance matters; 2) scraping can have legal and ethical limits (respect robots.txt, site terms, and copyright); 3) the instructions mention User-Agent, cookies, and proxies — avoid using the skill to bypass access controls or to scrape private/login-required data unless you have explicit permission; and 4) if you plan large-scale crawling, ensure you rate-limit and use appropriate infrastructure (and be mindful of potential IP blocking or service abuse). If you need stricter guarantees, request the skill author/publisher info or a published source repository before use.Like a lobster shell, security has layers — review code before you run it.
latest
Web Scraper — 网页内容抓取工具
功能说明
从网页抓取并解析内容,支持多种提取方式。
使用方法
1. 抓取网页全文
用户: 抓取 https://example.com 的内容
执行步骤:
- 使用
web_fetch工具抓取URL - 返回markdown格式的正文内容
2. 提取特定元素
用户: 从 https://news.ycombinator.com 提取所有新闻标题
执行步骤:
- 使用
web_fetch抓取页面 - 分析HTML结构,识别标题元素
- 提取并列表返回
3. 批量抓取
用户: 抓取以下URL列表的内容:
https://url1.com
https://url2.com
https://url3.com
执行步骤:
- 遍历URL列表
- 依次调用
web_fetch - 汇总结果
4. 提取链接
用户: 提取 https://example.com 页面中的所有外链
执行步骤:
- 抓取页面内容
- 解析所有
<a href>标签 - 过滤出外链(域名不同的链接)
- 列表返回
示例对话
用户: 抓取 https://github.com/trending 今天的热门项目
Agent:
- 调用
web_fetch抓取 GitHub Trending 页面 - 解析项目列表(仓库名、描述、star数)
- 格式化输出:
今日 GitHub 热门项目:
1. owner/repo-name - 项目描述
⭐ 1,234 stars today | 📝 JavaScript
2. ...
注意事项
- 遵守 robots.txt
- 添加适当延迟避免被封
- 处理反爬机制(User-Agent、Cookie等)
- 大规模抓取建议使用代理
依赖
web_fetch工具(OpenClaw内置)- 无需额外安装
Comments
Loading comments...
