XCrawl Scraper
v1.0.0XCrawl - AI-Powered Web Scraping API / AI 驱动网页爬虫,支持结构化数据提取
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description match the actual behavior: the files implement a Python client wrapper that calls an external XCrawl API (scrape/search/map/crawl). Required binaries (python, pip) and the xcrawl SDK are appropriate for a Python scraper client.
Instruction Scope
SKILL.md instructs installing the xcrawl SDK, configuring an API key, and running the included script. The runtime instructions and the script only access the included config.json and the XCrawl network endpoint (apiUrl). The instructions do not attempt to read unrelated host files or environment variables.
Install Mechanism
There is no automated install spec; SKILL.md recommends pip install xcrawl. Using pip is typical for Python SDKs but carries the usual supply‑chain risk of third‑party packages—verify the 'xcrawl' package source (PyPI/GitHub) before installing.
Credentials
No environment variables or unrelated credentials are requested. The only secret is an API key that the user explicitly sets via the provided set-key command and which is stored in scripts/config.json — behaviour consistent with a cloud scraping API client.
Persistence & Privilege
The skill is not always-enabled and does not request elevated or cross-skill configuration. It persists only its own API key into scripts/config.json; autonomous invocation of the skill by the agent is allowed (platform default) but not excessive given the skill's function.
Assessment
This skill appears to be what it says: a client wrapper for the XCrawl scraping API. Before installing or running it: 1) Verify the xcrawl Python package (source repository, PyPI record) to ensure you trust the package and author; 2) Be aware the scraper sends page data to the XCrawl service (apiUrl run.xcrawl.com) — avoid using it with credentials or pages containing sensitive personal or proprietary data unless you trust the provider and its privacy policy; 3) The API key you set is stored in scripts/config.json inside the skill directory—treat it like any secret and don't check it into public repositories; 4) Installing via pip pulls third‑party code (supply‑chain risk), so prefer inspecting the package source first; 5) Confirm that crawling targets comply with terms of service and robots.txt. If you want extra assurance, review the xcrawl SDK source code and the network endpoints it contacts before use.Like a lobster shell, security has layers — review code before you run it.
Runtime requirements
🕷️ Clawdis
Binspython, pip
aijsonlatestmarkdownscraperweb-scraping
<p align="center">
<img src="https://img.shields.io/badge/Python-3.8%2B-blue?style=for-the-badge&logo=python" alt="Python">
<img src="https://img.shields.io/badge License-MIT-yellow?style=for-the-badge" alt="License">
</p>
<h1 align="center">🕷️ XCrawl Scraper / XCrawl 网页爬虫</h1>
<p align="center">
<strong>AI-Powered Web Scraping API - 结构化数据提取利器</strong><br>
<em>支持 Markdown、HTML、JSON、Screenshot 等多种格式输出</em>
</p>
<p align="center">🕷️ Powered by XCrawl</p>
✨ 功能特点 / Features
| 功能 | 说明 |
|---|---|
| 🏷️ 网页爬取 | 支持 Markdown、HTML、JSON、Screenshot |
| 🔍 搜索 | 搜索引擎结果爬取 |
| 🗺️ 网站地图 | 自动发现站点所有页面 |
| 🕷️ 站点爬取 | 批量爬取整个站点 |
| 📊 结构化数据 | JSON Schema 自动提取结构化数据 |
| 🌐 代理支持 | 全球代理可选 |
📦 安装 / Installation
方式一:运行安装脚本
scripts\install.bat
方式二:手动安装
# 1. 安装依赖
pip install xcrawl
# 2. 配置 API Key
python scripts\xcrawl_scraper.py set-key YOUR_API_KEY
⚙️ 获取 API Key
- 访问 https://xcrawl.com 注册账号
- 获取 API Key
- 配置:
python scripts\xcrawl_scraper.py set-key YOUR_API_KEY
📖 使用方法 / Usage
1. 爬取网页 (基本)
python scripts\xcrawl_scraper.py scrape https://example.com markdown
2. 爬取多个格式
python scripts\xcrawl_scraper.py scrape https://example.com markdown html links
3. 结构化数据提取 (JSON)
python scripts\xcrawl_scraper.py scrape https://example.com json "提取产品名称和价格"
4. 搜索
python scripts\xcrawl_scraper.py search "web scraping"
5. 网站地图
python scripts\xcrawl_scraper.py map https://example.com
6. 站点爬取
python scripts\xcrawl_scraper.py crawl https://example.com
📋 命令列表
| 命令 | 说明 |
|---|---|
scrape <URL> [formats...] | 爬取网页 |
search <query> | 搜索 |
map <URL> | 网站地图 |
crawl <URL> | 站点爬取 |
set-key <API_KEY> | 设置 API Key |
config | 显示配置 |
🔧 配置 / Configuration
配置文件: scripts/config.json
{
"apiKey": "YOUR_API_KEY",
"apiUrl": "https://run.xcrawl.com",
"timeout": 60,
"defaultFormats": ["markdown"],
"defaultProxy": ""
}
📝 示例输出
Markdown 输出
# Example Domain
This domain is for use in illustrative examples in documents.
JSON 输出
{
"product_name": "iPhone 15 Pro",
"price": 999,
"currency": "USD"
}
📦 依赖
- Python >= 3.8
- xcrawl SDK
🤝 贡献 / Contributing
欢迎提交 Issue 和 Pull Request!
📄 许可证
MIT License
<p align="center">🕷️ Powered by XCrawl</p>
Comments
Loading comments...
