Crawl4ai Skill

Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scrapin...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
2 · 692 · 5 current installs · 5 all-time installs
bylance@lancelin111
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description describe a web crawler/scraper and the SKILL.md only asks for a crawl4ai-skill binary / pip package and shows crawling/scraping commands — these are coherent with the stated purpose.
Instruction Scope
Runtime instructions and examples are limited to searching, crawling, and scraping behavior (including dynamic JS pages). They do not ask the agent to read unrelated files, environment variables, or to exfiltrate data to third-party endpoints.
Install Mechanism
This is an instruction-only skill; SKILL.md recommends 'pip install crawl4ai-skill' (PyPI). That is an expected install path for a Python CLI, but installing from PyPI executes third-party code — review the package source before installing or run in an isolated environment.
Credentials
No environment variables, credentials, or config paths are requested. The lack of extra secrets is proportionate to a web crawler skill.
Persistence & Privilege
Skill is not marked always:true and does not request any persistent system-wide privileges or modify other skills. Autonomous invocation is allowed (platform default) and not combined with other concerning requests.
Assessment
This skill appears internally consistent for web crawling/scraping. Before installing: (1) review the crawl4ai-skill PyPI/GitHub source or the package wheel to confirm what code will run, (2) prefer installing in a virtualenv or sandbox, (3) be mindful of legal/ethical constraints and robots.txt for target sites, and (4) limit use against any sensitive endpoints or credentials. If you need higher assurance, inspect the package's GitHub repo and the code for any unexpected network calls or telemetry.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.10
Download zip
latestvk978dt54dfst0w48mct4066eq982p8kt

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

Binscrawl4ai-skill

SKILL.md

Crawl4AI Skill - Web Crawler & Scraper

Web Crawling 网页爬虫 | Web Scraping 网页爬取 | LLM 优化输出

智能网页爬虫和爬取工具,支持搜索、全站爬取、动态页面抓取。Free web crawler and scraper with LLM-optimized Markdown output.

核心功能 | Core Features

  • 🔍 Web Search 网页搜索 - DuckDuckGo search, 免 API key
  • 🕷️ Web Crawling 网页爬虫 - Site crawler, spider, sitemap 识别
  • 📝 Web Scraping 网页抓取 - Smart scraper, data extraction
  • 📄 LLM-Optimized Output - Fit Markdown, 省 Token 80%
  • Dynamic Page Scraping - JavaScript 渲染页面爬取

快速开始 | Quick Start

安装 | Installation

pip install crawl4ai-skill

Web Search | 网页搜索

# Search the web with DuckDuckGo
crawl4ai-skill search "python web scraping"

Web Scraping | 单页爬取

# Scrape a single web page
crawl4ai-skill crawl https://example.com

Web Crawling | 全站爬虫

# Crawl entire website / spider
crawl4ai-skill crawl-site https://docs.python.org --max-pages 50

使用场景 | Use Cases

场景 1:Web Crawler for Documentation | 文档站爬虫

# Crawl documentation site with spider
crawl4ai-skill crawl-site https://docs.fastapi.com --max-pages 100

爬虫效果 | Crawler Output:

  • ❌ 移除:导航栏、侧边栏、广告
  • ✅ 保留:标题、正文、代码块
  • 📊 Token:50,000 → 10,000(-80%)

场景 2:Search + Scrape | 搜索+爬取

# Search and scrape top results
crawl4ai-skill search-and-crawl "Vue 3 best practices" --crawl-top 3

场景 3:Dynamic Page Scraping | 动态页面抓取

JavaScript 渲染的页面爬取(雪球、知乎等):

# Scrape JavaScript-heavy pages
crawl4ai-skill crawl https://xueqiu.com/S/BIDU --wait-until networkidle --delay 2

命令参考 | Commands

命令 Command说明 Description
search <query>Web search 网页搜索
crawl <url>Web scraping 单页爬取
crawl-site <url>Web crawling 全站爬虫
search-and-crawl <query>Search + scrape 搜索并爬取

常用参数 | Common Options

# Web Search 搜索
--num-results 10          # Number of results

# Web Scraping 爬取
--format fit_markdown     # Output format
--output result.md        # Output file
--wait-until networkidle  # Wait strategy for dynamic pages
--delay 2                 # Additional wait time (seconds)
--wait-for ".selector"    # Wait for specific element

# Web Crawling 爬虫
--max-pages 100          # Max pages to crawl
--max-depth 3            # Max crawl depth

输出格式 | Output Formats

fit_markdown(推荐 Recommended)

智能提取,节省 80% Token。Smart extraction, save 80% tokens.

crawl4ai-skill crawl https://example.com --format fit_markdown

raw_markdown

保留完整结构。Preserve full structure.

crawl4ai-skill crawl https://example.com --format raw_markdown

为什么选择这个爬虫?| Why This Crawler?

免费爬虫 Free Crawler - 无需 API key,开箱即用
智能爬取 Smart Scraper - 自动去噪,提取核心内容
全站爬虫 Site Crawler - 支持 sitemap,递归爬取
动态爬取 Dynamic Scraping - JavaScript 渲染页面支持
搜索集成 Search Integration - DuckDuckGo 搜索内置


链接 | Links

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…