Install
openclaw skills install crawl4ai-docker-skillDockerized web crawling and scraping service with REST API. Docker化网页爬虫服务 | Web crawler, web scraper, REST API service. Intelligent content extraction with LLM optimization. 智能内容提取 | Docker部署,REST API调用
openclaw skills install crawl4ai-docker-skillDockerized Web Crawling 网页爬虫服务 | REST API 网页爬取 | LLM 智能提取
基于 Docker 部署的 Crawl4AI 网页爬虫服务,提供完整的 REST API 接口,支持智能内容提取和 LLM 优化输出。
确保 Docker Compose 服务正在运行:
# 检查服务状态
docker compose ps
# 健康检查
curl http://localhost:11235/health
# 访问监控面板
open http://localhost:11235/dashboard
curl -X POST "http://localhost:11235/crawl" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com"],
"extraction_strategy": "markdown"
}'
curl -X POST "http://localhost:11235/crawl" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com"],
"extraction_strategy": "markdown",
"browser_config": {
"headless": true,
"viewport_width": 1280,
"viewport_height": 720
}
}'
curl -X POST "http://localhost:11235/crawl" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com"],
"extraction_strategy": {
"type": "llm",
"provider": "openrouter/free",
"instruction": "总结网页的主要内容",
"max_tokens": 1000
}
}'
curl -X POST "http://localhost:11235/crawl" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com/products"],
"extraction_strategy": {
"type": "llm",
"provider": "openrouter/free",
"instruction": "提取所有产品名称、价格和描述,返回 JSON 格式",
"max_tokens": 1500,
"temperature": 0.1
}
}'
curl -X POST "http://localhost:11235/screenshot" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"options": {
"full_page": true,
"quality": 80
}
}'
curl -X POST "http://localhost:11235/pdf" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com"
}'
| 端点 | 方法 | 用途 |
|---|---|---|
POST /crawl | POST | 网页抓取和内容提取 |
GET /health | GET | 服务健康检查 |
GET /dashboard | GET | 监控面板 |
| 端点 | 方法 | 用途 |
|---|---|---|
GET /monitor/health | GET | 系统健康状态 |
GET /monitor/browsers | GET | 浏览器池状态 |
GET /monitor/requests | GET | 请求统计 |
| 端点 | 方法 | 用途 |
|---|---|---|
POST /screenshot | POST | 网页截图 |
POST /pdf | POST | PDF 生成 |
POST /execute_js | POST | JavaScript 执行 |
curl -X POST "http://localhost:11235/crawl" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://docs.openclaw.ai/zh-CN"],
"extraction_strategy": "markdown"
}'
curl -X POST "http://localhost:11235/crawl" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://news-site.com/article"],
"extraction_strategy": {
"type": "llm",
"provider": "openrouter/free",
"instruction": "提取文章标题、作者、发布时间和主要内容",
"max_tokens": 1500
}
}'
curl -X POST "http://localhost:11235/crawl" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://ecommerce-site.com/products"],
"extraction_strategy": {
"type": "llm",
"provider": "openrouter/free",
"instruction": "提取所有产品的名称、价格、描述和图片链接",
"max_tokens": 2000
}
}'
创建 .llm.env 文件:
# OpenRouter 配置
OPENROUTER_API_KEY=your-api-key
LLM_PROVIDER=openrouter/free
LLM_MAX_TOKENS=2000
LLM_TEMPERATURE=0.7
# 或使用其他提供商
# OPENAI_API_KEY=sk-your-key
# OPENAI_BASE_URL=https://your-custom-api.com/v1
# LLM_PROVIDER=openai/gpt-4o-mini
{
"browser_config": {
"headless": true,
"viewport_width": 1280,
"viewport_height": 720,
"user_agent": "Mozilla/5.0..."
}
}
{
"success": true,
"results": [
{
"url": "https://example.com",
"markdown": "# 提取的 Markdown 内容...",
"metadata": {
"title": "网页标题",
"description": "网页描述",
"url": "https://example.com"
},
"extracted_content": {
"summary": "LLM 提取的内容..."
}
}
]
}
{
"success": false,
"error": "错误信息",
"code": "ERROR_CODE"
}
# 检查容器状态
docker compose ps
# 查看日志
docker compose logs crawl4ai
# 重启服务
docker compose restart crawl4ai
.llm.env 配置# 测试网络连接
curl -I https://example.com
# 检查代理配置
env | grep -i proxy
# 访问监控面板
open http://localhost:11235/dashboard
# 查看系统健康
curl http://localhost:11235/monitor/health
# 查看浏览器池状态
curl http://localhost:11235/monitor/browsers
✅ 容器化部署 - 一键启动,环境隔离
✅ REST API - 标准 HTTP 接口,易于集成
✅ 实时监控 - 内置监控面板和 API
✅ 资源管理 - 自动浏览器池管理
✅ 生产就绪 - 企业级稳定性和性能
立即开始使用 Docker 化的 Crawl4AI 服务! 🚀