Web Content Fetcher

网页内容获取工具 | 当常规爬虫被过滤时，使用替代服务获取网页内容。支持：1) r.jina.ai - 最稳定 2) markdown.new - Cloudflare 专用 3) defuddle.md - 备用方案。触发词：获取网页内容、网页转markdown、内容抓取、fetch webpage、bypas...

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 3 · 3k · 38 current installs · 40 all-time installs

by@MRTommyWU

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Benign

medium confidence

ℹ

Purpose & Capability

The skill's name, description, SKILL.md, and included scripts consistently implement fetching webpages through r.jina.ai, markdown.new, and defuddle.md. One minor mismatch: the repository metadata declares no required binaries, but both included scripts invoke curl; the skill therefore implicitly requires curl to be present.

ℹ

Instruction Scope

The SKILL.md and scripts stay within the stated scope (try web_fetch then fall back to the listed services). However the tool will transmit the target URL (and the fetched page content) to third‑party services — this can leak sensitive URLs or query parameters. The scripts do not URL‑encode or sanitize inputs and will forward whatever URL the user supplies. No instructions read local files or environment variables beyond typical execution.

✓

Install Mechanism

No install spec or remote download is used; the skill is instruction+code only (two small scripts included). Nothing is fetched or extracted during install, so installation risk is low.

✓

Credentials

The skill requests no credentials or environment variables, which is proportionate for its purpose. It does, however, rely on network access to external services; no secrets are required or collected by the code.

✓

Persistence & Privilege

The skill does not request persistent/always presence and does not modify other skills or system configuration. It runs on demand and has no elevated privileges.

Assessment

This skill appears to do what it says, but take these precautions before installing or using it: - Ensure curl is available on the agent environment (the scripts call curl but the metadata doesn't list it). - Understand that using this tool sends the target URL (and therefore any sensitive query parameters) and the fetched content to third‑party services (r.jina.ai, markdown.new, defuddle.md). Do not use it with private/internal URLs, links containing secrets, or sensitive documents unless you trust those services' privacy policies. - The scripts do not URL‑encode or sanitize input, so some URLs may fail or be forwarded in an unsafe form; consider validating/encoding URLs before use. - If you need stronger privacy, run your own fetching proxy or use a local HTML→markdown converter instead of these external services. - If you want the metadata to be accurate, update the declared required binaries to include curl.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.1

Download zip

latestvk97cwgrpsxbk2fej3mebev8ccn82dedn

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🌐 Clawdis

SKILL.md

网页内容获取工具

当常规 web_fetch/web_search 无法获取内容时，使用替代服务获取网页 Markdown 格式内容。

支持的服务

优先级	服务	用法	适用场景
1	r.jina.ai	`https://r.jina.ai/{url}`	最稳定，通用性强
2	markdown.new	`https://markdown.new/{url}`	Cloudflare 保护网站
3	defuddle.md	`https://defuddle.md/{url}`	备用方案

使用方法

直接调用

当需要获取网页内容时，按顺序尝试：

首先用 web_fetch 尝试获取
如果失败或被过滤，调用本工具

# 使用 jina.ai (首选)
curl -s "https://r.jina.ai/https://example.com"

# 使用 markdown.new (Cloudflare)
curl -s "https://markdown.new/https://example.com"

# 使用 defuddle.md (备用)
curl -s "https://defuddle.md/https://example.com"

API 格式

# 简单获取
fetch_webpage <url>

# 指定方法
fetch_webpage <url> --method jina|markdown|defuddle

示例

用户: 帮我获取 https://news.example.com/article/123 的内容
助手: (使用 r.jina.ai 获取)

工具脚本

本目录包含 fetch.sh 脚本，可直接调用：

./fetch.sh https://example.com
./fetch.sh https://example.com jina

让网页内容获取不再受限 🌐

Files

4 total

Select a file

Select a file to preview.

Comments

Loading comments…