Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Crawl4ai Docker Skill

Dockerized web crawling and scraping service with REST API. Docker化网页爬虫服务 | Web crawler, web scraper, REST API service. Intelligent content extraction with L...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 25 · 0 current installs · 0 all-time installs
byOrange@orange-afk
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description match the files: this is a Dockerized Crawl4AI crawler that exposes a REST API and supports LLM-based extraction. That purpose justifies the examples, endpoints, and LLM provider configuration shown in SKILL.md and example-config.json.
!
Instruction Scope
SKILL.md and scripts instruct the agent/user to run curl requests against a local service, create a .llm.env with LLM API keys, and call endpoints that can execute JS (/execute_js). The runtime instructions will cause the service to fetch arbitrary URLs (user-supplied), which can reach internal network hosts if the service runs in a permissive network environment. The docs reference reading .llm.env values not declared in the registry metadata (see environment_proportionality).
Install Mechanism
There is no install spec (instruction-only plus utility scripts). Nothing is downloaded or extracted by the skill bundle itself, so no high-risk install actions are present in the package.
!
Credentials
SKILL.md documents LLM API keys (OPENROUTER_API_KEY, OPENAI_API_KEY, etc.) and instructs creating .llm.env, but the registry metadata lists no required environment variables or primary credential. The scripts also assume binaries (curl, jq, maybe open) but the skill metadata doesn't declare required binaries. This mismatch between documentation and declared requirements is a coherence issue and a potential operational risk.
Persistence & Privilege
always is false and the skill does not request elevated platform privileges. It does not modify other skills or global configuration in the bundle. Normal autonomous invocation is allowed (default) but not combined with other high-risk flags.
What to consider before installing
This package appears to be legitimate documentation and helper scripts for running a Dockerized Crawl4AI service, but there are a few things to check before installing: - Metadata mismatch: the skill's SKILL.md tells you to create a .llm.env containing LLM API keys (OPENROUTER_API_KEY, OPENAI_API_KEY, etc.), but the registry metadata does not declare any required environment variables. Treat LLM API keys as sensitive — only provide them if you trust the Crawl4AI image and its source. - Missing binary declarations: the scripts call curl and jq (and examples use open). Make sure those binaries are present and trustworthy in your environment. - Network exposure / SSRF risk: running a crawler that fetches arbitrary URLs can probe internal network services from the host/container. Run it in a network-restricted environment (isolated Docker network, no privileged host networking) if you don't want it to access internal resources. - Container image provenance: example-config.json references image unclecode/crawl4ai:latest. Verify the Docker image source (official repo, signed image, or audit the image) before pulling/running it, especially if providing API keys. - /execute_js and LLM providers: endpoints that execute JavaScript and submit scraped content to external LLM providers can leak sensitive data. Review what you send to external LLMs and ensure you redact secrets. If you want to proceed, confirm the Docker image origin, run the service in an isolated environment, only supply LLM keys you control, and ensure curl/jq are installed from trusted sources. If you need, provide the docker-compose file or the actual image manifest and I can help inspect it further.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97571e6xj72pswssw7qvehnx183stce

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

Crawl4AI Docker Skill - Web Crawler & Scraper Service

Dockerized Web Crawling 网页爬虫服务 | REST API 网页爬取 | LLM 智能提取

基于 Docker 部署的 Crawl4AI 网页爬虫服务,提供完整的 REST API 接口,支持智能内容提取和 LLM 优化输出。

🚀 核心功能 | Core Features

  • 🐳 Docker 部署 - 容器化服务,端口 11235
  • 🔌 REST API - 完整的 HTTP 接口
  • 🤖 LLM 智能提取 - 支持多种 LLM 提供商
  • 📊 实时监控 - 内置监控面板和 API
  • 高性能 - 异步处理,支持并发请求

📋 快速开始 | Quick Start

前提条件 | Prerequisites

确保 Docker Compose 服务正在运行:

# 检查服务状态
docker compose ps

# 健康检查
curl http://localhost:11235/health

# 访问监控面板
open http://localhost:11235/dashboard

🔌 REST API 使用指南

基础网页抓取 | Basic Web Crawling

简单 Markdown 提取

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "extraction_strategy": "markdown"
  }'

带浏览器配置

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "extraction_strategy": "markdown",
    "browser_config": {
      "headless": true,
      "viewport_width": 1280,
      "viewport_height": 720
    }
  }'

LLM 智能提取 | LLM Smart Extraction

内容总结

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com"],
    "extraction_strategy": {
      "type": "llm",
      "provider": "openrouter/free",
      "instruction": "总结网页的主要内容",
      "max_tokens": 1000
    }
  }'

结构化数据提取

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://example.com/products"],
    "extraction_strategy": {
      "type": "llm",
      "provider": "openrouter/free",
      "instruction": "提取所有产品名称、价格和描述,返回 JSON 格式",
      "max_tokens": 1500,
      "temperature": 0.1
    }
  }'

高级功能 | Advanced Features

网页截图

curl -X POST "http://localhost:11235/screenshot" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "options": {
      "full_page": true,
      "quality": 80
    }
  }'

PDF 生成

curl -X POST "http://localhost:11235/pdf" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com"
  }'

📊 API 端点参考 | API Endpoints Reference

核心端点 | Core Endpoints

端点方法用途
POST /crawlPOST网页抓取和内容提取
GET /healthGET服务健康检查
GET /dashboardGET监控面板

监控端点 | Monitoring Endpoints

端点方法用途
GET /monitor/healthGET系统健康状态
GET /monitor/browsersGET浏览器池状态
GET /monitor/requestsGET请求统计

工具端点 | Utility Endpoints

端点方法用途
POST /screenshotPOST网页截图
POST /pdfPOSTPDF 生成
POST /execute_jsPOSTJavaScript 执行

🎯 使用场景 | Use Cases

场景 1:文档网站爬取 | Documentation Site Crawling

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://docs.openclaw.ai/zh-CN"],
    "extraction_strategy": "markdown"
  }'

场景 2:新闻文章提取 | News Article Extraction

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://news-site.com/article"],
    "extraction_strategy": {
      "type": "llm",
      "provider": "openrouter/free",
      "instruction": "提取文章标题、作者、发布时间和主要内容",
      "max_tokens": 1500
    }
  }'

场景 3:产品信息抓取 | Product Information Scraping

curl -X POST "http://localhost:11235/crawl" \
  -H "Content-Type: application/json" \
  -d '{
    "urls": ["https://ecommerce-site.com/products"],
    "extraction_strategy": {
      "type": "llm",
      "provider": "openrouter/free",
      "instruction": "提取所有产品的名称、价格、描述和图片链接",
      "max_tokens": 2000
    }
  }'

⚙️ 配置说明 | Configuration

LLM 提供商配置 | LLM Provider Configuration

创建 .llm.env 文件:

# OpenRouter 配置
OPENROUTER_API_KEY=your-api-key
LLM_PROVIDER=openrouter/free
LLM_MAX_TOKENS=2000
LLM_TEMPERATURE=0.7

# 或使用其他提供商
# OPENAI_API_KEY=sk-your-key
# OPENAI_BASE_URL=https://your-custom-api.com/v1
# LLM_PROVIDER=openai/gpt-4o-mini

浏览器配置 | Browser Configuration

{
  "browser_config": {
    "headless": true,
    "viewport_width": 1280,
    "viewport_height": 720,
    "user_agent": "Mozilla/5.0..."
  }
}

📈 响应格式 | Response Format

成功响应 | Success Response

{
  "success": true,
  "results": [
    {
      "url": "https://example.com",
      "markdown": "# 提取的 Markdown 内容...",
      "metadata": {
        "title": "网页标题",
        "description": "网页描述",
        "url": "https://example.com"
      },
      "extracted_content": {
        "summary": "LLM 提取的内容..."
      }
    }
  ]
}

错误响应 | Error Response

{
  "success": false,
  "error": "错误信息",
  "code": "ERROR_CODE"
}

🔧 故障排除 | Troubleshooting

常见问题 | Common Issues

1. 服务未启动

# 检查容器状态
docker compose ps

# 查看日志
docker compose logs crawl4ai

# 重启服务
docker compose restart crawl4ai

2. LLM 提取失败

  • 检查 .llm.env 配置
  • 验证 API 密钥
  • 测试不同的 LLM 提供商

3. 网络连接问题

# 测试网络连接
curl -I https://example.com

# 检查代理配置
env | grep -i proxy

监控和调试 | Monitoring & Debugging

# 访问监控面板
open http://localhost:11235/dashboard

# 查看系统健康
curl http://localhost:11235/monitor/health

# 查看浏览器池状态
curl http://localhost:11235/monitor/browsers

🔗 相关链接 | Links


🎉 为什么选择 Docker 版本?

容器化部署 - 一键启动,环境隔离
REST API - 标准 HTTP 接口,易于集成
实时监控 - 内置监控面板和 API
资源管理 - 自动浏览器池管理
生产就绪 - 企业级稳定性和性能

立即开始使用 Docker 化的 Crawl4AI 服务! 🚀

Files

5 total
Select a file
Select a file to preview.

Comments

Loading comments…