Install
openclaw skills install web-to-markdown网页内容抓取与图片提取技能。支持:(1) 将网页转换为 Markdown 格式阅读 (2) 从任意网站提取图片 URL (3) 批量下载网页图片。当需要抓取网页内容、读取文章、提取或下载网站图片时使用此技能。支持 markdown.new、defuddle.md、r.jina.ai 等转换服务,自动降级确保成功。 Web scraping and image extraction skill. Supports: (1) Converting web pages to Markdown for reading (2) Extracting image URLs from any website (3) Batch downloading images. Use this skill when you need to scrape web content, read articles, or extract/download images from websites. Supports markdown.new, defuddle.md, r.jina.ai conversion services with automatic fallback.
openclaw skills install web-to-markdown通用网页抓取工具,支持: A general-purpose web scraping tool that supports:
适用于内容阅读、图片收集、资料整理等场景。 Suitable for content reading, image collection, and data organization.
将网页 URL 转换为干净的 Markdown 文本,移除广告、导航栏等无关内容。 Converts a web page URL into clean Markdown text, removing ads, navigation bars, and other irrelevant content.
URL 前缀服务 / URL Prefix Services:
| 服务 Service | 前缀 Prefix | 特点 Notes |
|---|---|---|
| markdown.new | https://markdown.new/ | 首选,速度快 / Preferred, fast |
| defuddle | https://defuddle.md/ | 备选 / Fallback |
| r.jina.ai | https://r.jina.ai/ | 适合动态内容 / Good for dynamic content |
使用 / Usage:
curl -s "https://markdown.new/https://example.com/article"
curl -s "https://r.jina.ai/https://example.com/article"
从任意网页提取所有图片 URL。 Extracts all image URLs from any web page.
通用提取 / General Extraction:
# 提取所有图片 URL / Extract all image URLs
curl -s "https://r.jina.ai/<url>" | grep -oE 'https://[^)\s"]+\.(jpg|jpeg|png|gif|webp|avif)'
使用脚本 / Using the Script:
python scripts/extract_images.py <url> [--output urls.txt]
从网页提取图片并批量下载到本地。 Extracts images from web pages and downloads them in batch to local storage.
使用脚本 / Using the Script:
python scripts/download_images.py <url> [--output <dir>] [--limit <n>] [--min-size <bytes>]
参数 / Parameters:
url: 网页 URL / Web page URL--output: 输出目录(默认 ~/.openclaw/images)/ Output directory (default: ~/.openclaw/images)--limit: 最大下载数(默认 50)/ Max downloads (default: 50)--min-size: 最小文件大小,过滤小图标(默认 10KB)/ Min file size to filter out small icons (default: 10KB)--ext: 只下载指定格式(jpg/png/gif/webp)/ Only download specific formats (jpg/png/gif/webp)示例 / Examples:
# 下载网页中的所有大图 / Download all large images from a page
python scripts/download_images.py "https://example.com/gallery" --output ~/Downloads/images
# 只下载 PNG,最多 20 张 / Download only PNGs, max 20
python scripts/download_images.py "https://example.com" --ext png --limit 20
# Pinterest(自动转换原始尺寸)/ Pinterest (auto-converts to original size)
python scripts/download_images.py "https://www.pinterest.com/search/pins/?q=architecture"
markdown.new/ / Prefer markdown.new/defuddle.md/ / Fall back to defuddle.md/r.jina.ai/ / Then try r.jina.ai/r.jina.ai 获取网页内容 / Use r.jina.ai to fetch page content自动识别 Pinterest URL,将缩略图转换为原始尺寸: Automatically detects Pinterest URLs and converts thumbnails to original size:
236x → originals564x → originals脚本会自动处理各种网站的图片 URL 格式,包括: The scripts automatically handle various image URL formats, including:
本地网页抓取脚本,作为在线服务的降级方案。 Local web scraping script, used as a fallback for online services.
python scripts/scrape.py <url>
提取网页中的图片 URL,输出为列表。 Extracts image URLs from a web page and outputs them as a list.
python scripts/extract_images.py <url> [--output urls.txt]
批量下载网页图片。 Batch downloads images from a web page.
python scripts/download_images.py <url> [options]
extract_images.py 和 download_images.py 仅使用 Python 标准库,无需额外安装。
extract_images.py and download_images.py only use the Python standard library — no extra installation needed.
scrape.py 需要安装 scrapling(本地抓取降级方案):
scrape.py requires scrapling (local scraping fallback):
pip install scrapling
r.jina.ai / Dynamically loaded images may require r.jina.ai