Crawl4ai Skill

Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scrapin...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 2 · 692 · 5 current installs · 5 all-time installs

bylance@lancelin111

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

medium confidence

✓

Purpose & Capability

Name/description describe a web crawler/scraper and the SKILL.md only asks for a crawl4ai-skill binary / pip package and shows crawling/scraping commands — these are coherent with the stated purpose.

✓

Instruction Scope

Runtime instructions and examples are limited to searching, crawling, and scraping behavior (including dynamic JS pages). They do not ask the agent to read unrelated files, environment variables, or to exfiltrate data to third-party endpoints.

ℹ

Install Mechanism

This is an instruction-only skill; SKILL.md recommends 'pip install crawl4ai-skill' (PyPI). That is an expected install path for a Python CLI, but installing from PyPI executes third-party code — review the package source before installing or run in an isolated environment.

✓

Credentials

No environment variables, credentials, or config paths are requested. The lack of extra secrets is proportionate to a web crawler skill.

✓

Persistence & Privilege

Skill is not marked always:true and does not request any persistent system-wide privileges or modify other skills. Autonomous invocation is allowed (platform default) and not combined with other concerning requests.

Assessment

This skill appears internally consistent for web crawling/scraping. Before installing: (1) review the crawl4ai-skill PyPI/GitHub source or the package wheel to confirm what code will run, (2) prefer installing in a virtualenv or sandbox, (3) be mindful of legal/ethical constraints and robots.txt for target sites, and (4) limit use against any sensitive endpoints or credentials. If you need higher assurance, inspect the package's GitHub repo and the code for any unexpected network calls or telemetry.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.10

Download zip

latestvk978dt54dfst0w48mct4066eq982p8kt

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

Binscrawl4ai-skill

SKILL.md

Crawl4AI Skill - Web Crawler & Scraper

Web Crawling 网页爬虫 | Web Scraping 网页爬取 | LLM 优化输出

智能网页爬虫和爬取工具，支持搜索、全站爬取、动态页面抓取。Free web crawler and scraper with LLM-optimized Markdown output.

核心功能 | Core Features

🔍 Web Search 网页搜索 - DuckDuckGo search, 免 API key
🕷️ Web Crawling 网页爬虫 - Site crawler, spider, sitemap 识别
📝 Web Scraping 网页抓取 - Smart scraper, data extraction
📄 LLM-Optimized Output - Fit Markdown, 省 Token 80%
⚡ Dynamic Page Scraping - JavaScript 渲染页面爬取

快速开始 | Quick Start

安装 | Installation

pip install crawl4ai-skill

Web Search | 网页搜索

# Search the web with DuckDuckGo
crawl4ai-skill search "python web scraping"

Web Scraping | 单页爬取

# Scrape a single web page
crawl4ai-skill crawl https://example.com

Web Crawling | 全站爬虫

# Crawl entire website / spider
crawl4ai-skill crawl-site https://docs.python.org --max-pages 50

使用场景 | Use Cases

场景 1：Web Crawler for Documentation | 文档站爬虫

# Crawl documentation site with spider
crawl4ai-skill crawl-site https://docs.fastapi.com --max-pages 100

爬虫效果 | Crawler Output:

❌ 移除：导航栏、侧边栏、广告
✅ 保留：标题、正文、代码块
📊 Token：50,000 → 10,000（-80%）

场景 2：Search + Scrape | 搜索+爬取

# Search and scrape top results
crawl4ai-skill search-and-crawl "Vue 3 best practices" --crawl-top 3

场景 3：Dynamic Page Scraping | 动态页面抓取

JavaScript 渲染的页面爬取（雪球、知乎等）：

# Scrape JavaScript-heavy pages
crawl4ai-skill crawl https://xueqiu.com/S/BIDU --wait-until networkidle --delay 2

命令参考 | Commands

命令 Command	说明 Description
`search <query>`	Web search 网页搜索
`crawl <url>`	Web scraping 单页爬取
`crawl-site <url>`	Web crawling 全站爬虫
`search-and-crawl <query>`	Search + scrape 搜索并爬取

常用参数 | Common Options

# Web Search 搜索
--num-results 10          # Number of results

# Web Scraping 爬取
--format fit_markdown     # Output format
--output result.md        # Output file
--wait-until networkidle  # Wait strategy for dynamic pages
--delay 2                 # Additional wait time (seconds)
--wait-for ".selector"    # Wait for specific element

# Web Crawling 爬虫
--max-pages 100          # Max pages to crawl
--max-depth 3            # Max crawl depth

输出格式 | Output Formats

fit_markdown（推荐 Recommended）

智能提取，节省 80% Token。Smart extraction, save 80% tokens.

crawl4ai-skill crawl https://example.com --format fit_markdown

raw_markdown

保留完整结构。Preserve full structure.

crawl4ai-skill crawl https://example.com --format raw_markdown

为什么选择这个爬虫？| Why This Crawler?

✅ 免费爬虫 Free Crawler - 无需 API key，开箱即用
✅ 智能爬取 Smart Scraper - 自动去噪，提取核心内容
✅ 全站爬虫 Site Crawler - 支持 sitemap，递归爬取
✅ 动态爬取 Dynamic Scraping - JavaScript 渲染页面支持
✅ 搜索集成 Search Integration - DuckDuckGo 搜索内置

链接 | Links

📦 PyPI
💻 GitHub
🦞 ClawHub

Files

1 total

Select a file

Select a file to preview.

Comments

Loading comments…