Install
openclaw skills install web-to-markdown网页内容抓取与图片提取技能。支持:(1) 将网页转换为 Markdown 格式阅读 (2) 从任意网站提取图片 URL (3) 批量下载网页图片。当需要抓取网页内容、读取文章、提取或下载网站图片时使用此技能。支持 markdown.new、defuddle.md、r.jina.ai 等转换服务,自动降级确保成功。...
openclaw skills install web-to-markdown通用网页抓取工具,支持: A general-purpose web scraping tool that supports:
适用于内容阅读、图片收集、资料整理等场景。 Suitable for content reading, image collection, and data organization.
将网页 URL 转换为干净的 Markdown 文本,移除广告、导航栏等无关内容。 Converts a web page URL into clean Markdown text, removing ads, navigation bars, and other irrelevant content.
URL 前缀服务 / URL Prefix Services:
| 服务 Service | 前缀 Prefix | 特点 Notes |
|---|---|---|
| markdown.new | https://markdown.new/ | 首选,速度快 / Preferred, fast |
| defuddle | https://defuddle.md/ | 备选 / Fallback |
| r.jina.ai | https://r.jina.ai/ | 适合动态内容 / Good for dynamic content |
使用 / Usage:
curl -s "https://markdown.new/https://example.com/article"
curl -s "https://r.jina.ai/https://example.com/article"
从任意网页提取所有图片 URL。 Extracts all image URLs from any web page.
通用提取 / General Extraction:
# 提取所有图片 URL / Extract all image URLs
curl -s "https://r.jina.ai/<url>" | grep -oE 'https://[^)\s"]+\.(jpg|jpeg|png|gif|webp|avif)'
使用脚本 / Using the Script:
python scripts/extract_images.py <url> [--output urls.txt]
从网页提取图片并批量下载到本地。 Extracts images from web pages and downloads them in batch to local storage.
使用脚本 / Using the Script:
python scripts/download_images.py <url> [--output <dir>] [--limit <n>] [--min-size <bytes>]
参数 / Parameters:
url: 网页 URL / Web page URL--output: 输出目录(默认 ~/.openclaw/images)/ Output directory (default: ~/.openclaw/images)--limit: 最大下载数(默认 50)/ Max downloads (default: 50)--min-size: 最小文件大小,过滤小图标(默认 10KB)/ Min file size to filter out small icons (default: 10KB)--ext: 只下载指定格式(jpg/png/gif/webp)/ Only download specific formats (jpg/png/gif/webp)示例 / Examples:
# 下载网页中的所有大图 / Download all large images from a page
python scripts/download_images.py "https://example.com/gallery" --output ~/Downloads/images
# 只下载 PNG,最多 20 张 / Download only PNGs, max 20
python scripts/download_images.py "https://example.com" --ext png --limit 20
# Pinterest(自动转换原始尺寸)/ Pinterest (auto-converts to original size)
python scripts/download_images.py "https://www.pinterest.com/search/pins/?q=architecture"
markdown.new/ / Prefer markdown.new/defuddle.md/ / Fall back to defuddle.md/r.jina.ai/ / Then try r.jina.ai/r.jina.ai 获取网页内容 / Use r.jina.ai to fetch page content自动识别 Pinterest URL,将缩略图转换为原始尺寸: Automatically detects Pinterest URLs and converts thumbnails to original size:
236x → originals564x → originals脚本会自动处理各种网站的图片 URL 格式,包括: The scripts automatically handle various image URL formats, including:
本地网页抓取脚本,作为在线服务的降级方案。 Local web scraping script, used as a fallback for online services.
python scripts/scrape.py <url>
提取网页中的图片 URL,输出为列表。 Extracts image URLs from a web page and outputs them as a list.
python scripts/extract_images.py <url> [--output urls.txt]
批量下载网页图片。 Batch downloads images from a web page.
python scripts/download_images.py <url> [options]
extract_images.py 和 download_images.py 仅使用 Python 标准库,无需额外安装。
extract_images.py and download_images.py only use the Python standard library — no extra installation needed.
scrape.py 需要安装 scrapling(本地抓取降级方案):
scrape.py requires scrapling (local scraping fallback):
pip install scrapling
r.jina.ai / Dynamically loaded images may require r.jina.ai