Smart Web Fetch

智能网页抓取技能 - 替代内置 web_fetch，自动使用 Jina Reader / markdown.new / defuddle.md 清洗服务获取干净 Markdown。支持多级降级策略，大幅降低 Token 消耗。当 Agent 需要获取网页内容时使用本技能替代 web_fetch。

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 23 · 2.6k · 22 current installs · 24 all-time installs

by@Leochens

MIT-0

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

✓

Purpose & Capability

The code and instructions match the stated purpose: replacing web_fetch with a pipeline that prefers r.jina.ai then markdown.new then defuddle.md, falling back to direct fetching. The scripts implement that multi-stage strategy and the README shows how to call them.

Instruction Scope

SKILL.md instructs agents to run the included Python scripts and even to ban the built-in web_fetch (openclaw.json deny). The scripts will fetch target URLs and/or forward the target to external cleaning services — this means user-provided URLs and fetched page content are transmitted to third-party endpoints. There is no guidance or restriction to avoid internal-only or sensitive URLs.

✓

Install Mechanism

No install spec; the skill is instruction + small Python scripts only. Nothing is downloaded from external URLs during install.

Credentials

The skill requests no credentials, which is coherent, but it makes network requests to third‑party cleaning services with the full target URL and/or content. That can leak sensitive query parameters or page content to those services. The scripts also disable SSL verification (ssl.CERT_NONE), weakening transport security and increasing risk of MITM when contacting resources.

✓

Persistence & Privilege

always:false and user-invocable:true. The skill does suggest changing openclaw.json to deny the built-in web_fetch to force use of this skill — a configuration change with operational impact, but the skill does not request elevated agent privileges or automatic always-on inclusion.

What to consider before installing

This skill functions as advertised (fetch + clean via third-party services) but carries non-trivial privacy and network-security risks. Before installing, consider: - Third-party exposure: the scripts send target URLs (and indirectly page content) to r.jina.ai, markdown.new, and defuddle.md. If you fetch pages containing secrets or internal URLs, that data may be exposed to those services. - SSRF / internal resource risk: there is no allowlist/validation, so the agent could be asked to fetch internal IPs (e.g., metadata endpoints). Decide whether that is acceptable in your environment. - Disabled SSL verification: the code disables TLS verification, increasing the chance of man-in-the-middle tampering when fetching resources. - Operational impact: the README recommends denying the built-in web_fetch to force this skill; that prevents a safer local fetch fallback and could increase exposure. If you still want to use it, mitigate risk by: only allowing this skill for non-sensitive public URLs; adding an allowlist or hostname/IP blocklist to the scripts; re-enabling proper SSL verification; auditing third‑party services' privacy policies; testing on non-sensitive pages first; and avoiding the suggested global deny of web_fetch unless you accept the tradeoffs.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk972p6cvycf1nqcqnp6z3y8sz582842a

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

Smart Web Fetch

智能网页内容获取技能，完全替代 web_fetch，自动通过清洗服务获取干净 Markdown。

核心功能

完全替代 web_fetch: 获取的已经是清洗后的 Markdown，而非原始 HTML
四级降级策略: Jina → markdown.new → defuddle.md → 原始内容
Token 优化: 清洗后的内容比原始 HTML 节省 50-80% Token

使用方式

命令行获取网页内容

# 获取清洗后的 Markdown（文本输出）
python3 {baseDir}/scripts/fetch.py "https://example.com/article"

# 获取 JSON 格式（包含元信息）
python3 {baseDir}/scripts/fetch.py "https://example.com/article" --json

在 Agent 中使用

当用户需要获取网页内容时：

用户: "帮我查一下 https://example.com/article 的内容"

Agent 应该:
1. 运行: python3 ~/.openclaw/skills/smart-web-fetch/scripts/fetch.py "https://example.com/article"
2. 直接获得清洗后的 Markdown 内容

JSON 输出格式

{
  "success": true,
  "url": "https://r.jina.ai/http://example.com/article",
  "content": "# Article Title\n\nClean markdown content here...",
  "source": "jina",
  "error": null
}

降级策略

Jina Reader (首选)
- URL: https://r.jina.ai/http://{target}
- 免费，无需 API Key，中文支持好
markdown.new (降级)
- URL: https://markdown.new/{target}
defuddle.md (降级)
- URL: https://defuddle.md/{target}
原始内容 (最终兜底)
- 直接获取原始 HTML

Agent 配置建议

为了强制使用本技能替代 web_fetch，在 openclaw.json 中配置：

{
  "agents": {
    "list": [
      {
        "id": "your-agent",
        "tools": {
          "deny": ["web_fetch"]
        }
      }
    ]
  }
}

这样 Agent 就无法调用内置 web_fetch，只能通过本技能获取网页内容。

优势

🚀 Token 节省 50-80%: 去除广告、导航栏等噪音
🔄 自动容错: 四级服务降级，确保可用性
🆓 零成本: 全部使用免费服务
🔌 即插即用: 不需要 API Key
📝 干净输出: 纯 Markdown，无需额外解析

Files

3 total

Select a file

Select a file to preview.

Comments

Loading comments…