Crawl4ai Skill
Web crawling and scraping tool with LLM-optimized output. 网页爬虫爬取工具 | Web crawler, web scraper, spider. DuckDuckGo search, site crawling, dynamic page scrapin...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 2 · 692 · 5 current installs · 5 all-time installs
bylance@lancelin111
MIT-0
Security Scan
OpenClaw
Benign
medium confidencePurpose & Capability
Name/description describe a web crawler/scraper and the SKILL.md only asks for a crawl4ai-skill binary / pip package and shows crawling/scraping commands — these are coherent with the stated purpose.
Instruction Scope
Runtime instructions and examples are limited to searching, crawling, and scraping behavior (including dynamic JS pages). They do not ask the agent to read unrelated files, environment variables, or to exfiltrate data to third-party endpoints.
Install Mechanism
This is an instruction-only skill; SKILL.md recommends 'pip install crawl4ai-skill' (PyPI). That is an expected install path for a Python CLI, but installing from PyPI executes third-party code — review the package source before installing or run in an isolated environment.
Credentials
No environment variables, credentials, or config paths are requested. The lack of extra secrets is proportionate to a web crawler skill.
Persistence & Privilege
Skill is not marked always:true and does not request any persistent system-wide privileges or modify other skills. Autonomous invocation is allowed (platform default) and not combined with other concerning requests.
Assessment
This skill appears internally consistent for web crawling/scraping. Before installing: (1) review the crawl4ai-skill PyPI/GitHub source or the package wheel to confirm what code will run, (2) prefer installing in a virtualenv or sandbox, (3) be mindful of legal/ethical constraints and robots.txt for target sites, and (4) limit use against any sensitive endpoints or credentials. If you need higher assurance, inspect the package's GitHub repo and the code for any unexpected network calls or telemetry.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.10
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
Binscrawl4ai-skill
SKILL.md
Crawl4AI Skill - Web Crawler & Scraper
Web Crawling 网页爬虫 | Web Scraping 网页爬取 | LLM 优化输出
智能网页爬虫和爬取工具,支持搜索、全站爬取、动态页面抓取。Free web crawler and scraper with LLM-optimized Markdown output.
核心功能 | Core Features
- 🔍 Web Search 网页搜索 - DuckDuckGo search, 免 API key
- 🕷️ Web Crawling 网页爬虫 - Site crawler, spider, sitemap 识别
- 📝 Web Scraping 网页抓取 - Smart scraper, data extraction
- 📄 LLM-Optimized Output - Fit Markdown, 省 Token 80%
- ⚡ Dynamic Page Scraping - JavaScript 渲染页面爬取
快速开始 | Quick Start
安装 | Installation
pip install crawl4ai-skill
Web Search | 网页搜索
# Search the web with DuckDuckGo
crawl4ai-skill search "python web scraping"
Web Scraping | 单页爬取
# Scrape a single web page
crawl4ai-skill crawl https://example.com
Web Crawling | 全站爬虫
# Crawl entire website / spider
crawl4ai-skill crawl-site https://docs.python.org --max-pages 50
使用场景 | Use Cases
场景 1:Web Crawler for Documentation | 文档站爬虫
# Crawl documentation site with spider
crawl4ai-skill crawl-site https://docs.fastapi.com --max-pages 100
爬虫效果 | Crawler Output:
- ❌ 移除:导航栏、侧边栏、广告
- ✅ 保留:标题、正文、代码块
- 📊 Token:50,000 → 10,000(-80%)
场景 2:Search + Scrape | 搜索+爬取
# Search and scrape top results
crawl4ai-skill search-and-crawl "Vue 3 best practices" --crawl-top 3
场景 3:Dynamic Page Scraping | 动态页面抓取
JavaScript 渲染的页面爬取(雪球、知乎等):
# Scrape JavaScript-heavy pages
crawl4ai-skill crawl https://xueqiu.com/S/BIDU --wait-until networkidle --delay 2
命令参考 | Commands
| 命令 Command | 说明 Description |
|---|---|
search <query> | Web search 网页搜索 |
crawl <url> | Web scraping 单页爬取 |
crawl-site <url> | Web crawling 全站爬虫 |
search-and-crawl <query> | Search + scrape 搜索并爬取 |
常用参数 | Common Options
# Web Search 搜索
--num-results 10 # Number of results
# Web Scraping 爬取
--format fit_markdown # Output format
--output result.md # Output file
--wait-until networkidle # Wait strategy for dynamic pages
--delay 2 # Additional wait time (seconds)
--wait-for ".selector" # Wait for specific element
# Web Crawling 爬虫
--max-pages 100 # Max pages to crawl
--max-depth 3 # Max crawl depth
输出格式 | Output Formats
fit_markdown(推荐 Recommended)
智能提取,节省 80% Token。Smart extraction, save 80% tokens.
crawl4ai-skill crawl https://example.com --format fit_markdown
raw_markdown
保留完整结构。Preserve full structure.
crawl4ai-skill crawl https://example.com --format raw_markdown
为什么选择这个爬虫?| Why This Crawler?
✅ 免费爬虫 Free Crawler - 无需 API key,开箱即用
✅ 智能爬取 Smart Scraper - 自动去噪,提取核心内容
✅ 全站爬虫 Site Crawler - 支持 sitemap,递归爬取
✅ 动态爬取 Dynamic Scraping - JavaScript 渲染页面支持
✅ 搜索集成 Search Integration - DuckDuckGo 搜索内置
链接 | Links
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
