Smart Web Fetch Safe

安全版智能网页内容获取技能。本地解析 + 可选远程清洗,隐私优先,Token 优化。

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 167 · 1 current installs · 1 all-time installs
byJory@vimself
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description, required binary (python3), and declared deps (beautifulsoup4, requests) match the implemented behavior. The script implements local HTML cleaning and an optional remote cleaning path (Jina Reader) as described. Minor nit: SKILL.md emphasizes a "domain whitelist" but the implementation allows all domains when ALLOWED_DOMAINS is unset (documented in SKILL.md).
Instruction Scope
Runtime instructions and the script stay within the stated purpose: fetching a URL, locally cleaning HTML, or optionally calling a third-party cleaning endpoint. The only data sent to an external service in remote mode is a GET to r.jina.ai with the target URL encoded into the path (the remote service performs the fetch/clean). SKILL.md warns about third-party processing, but it says both URL and content are processed—implementation only issues a GET for the target URL (i.e., it does not POST local content).
Install Mechanism
Instruction-only skill with no install spec and a small Python script. Dependencies are standard pip packages (beautifulsoup4, requests) and no remote code is downloaded or extracted by an installer. Low installation risk.
Credentials
No required credentials or secrets are requested. SKILL.md documents optional env vars (ALLOWED_DOMAINS, MAX_CHARS, DEFAULT_MODE) which the script reads; registry shows none required—this is consistent. Caution: default ALLOWED_DOMAINS empty -> effectively allows fetching any domain unless the user sets a whitelist, which may be surprising given the emphasis on a whitelist feature.
Persistence & Privilege
always is false, no persistent installation or modifications to other skills or system config. The skill does network I/O when used but does not request elevated privileges or permanent presence.
Assessment
This skill appears to do what it claims: local HTML parsing by default and an optional remote cleaning call to r.jina.ai. Before using: (1) If you care about privacy, keep DEFAULT_MODE=local and install beautifulsoup4 so parsing stays local. (2) If you enable remote (--remote or DEFAULT_MODE=remote), understand the target URL will be fetched/processed by a third party (r.jina.ai) — do not send sensitive/private URLs. (3) Consider setting ALLOWED_DOMAINS to a tight whitelist (e.g., example.com, wikipedia.org) because when ALLOWED_DOMAINS is unset the script allows all domains. (4) No credentials are requested by the skill, and there is no installer that fetches arbitrary code, but review network behavior if you operate in a restricted environment.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk971xkcqjh0xm1z8wyq01yce1n83bdme

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🔐 Clawdis
Binspython3

SKILL.md

Smart Web Fetch Safe

安全版智能网页内容获取技能,隐私优先,支持本地解析和远程清洗两种模式。

核心功能

  • 本地解析默认: 使用本地 HTML 解析,隐私安全
  • 可选远程清洗: 用户可选择使用 Jina Reader 远程服务
  • Token 优化: 自动去除广告、导航栏等噪音内容
  • 域名白名单: 可配置允许访问的域名列表
  • 字符数限制: 内置最大字符数限制,避免超长输出

安全特性

⚠️ 隐私提示:

  • 本地解析模式:数据完全保留在本地,隐私安全
  • 远程清洗模式:URL 和内容会经过 Jina AI 服务处理

使用方式

命令行

# 本地解析模式(默认,隐私安全)
python3 skills/smart-web-fetch-safe/scripts/fetch.py "https://example.com/article"

# 远程清洗模式(更节省 Token,但数据经过第三方)
python3 skills/smart-web-fetch-safe/scripts/fetch.py "https://example.com/article" --remote

# 指定最大字符数
python3 skills/smart-web-fetch-safe/scripts/fetch.py "https://example.com/article" --max-chars 5000

# 组合使用
python3 skills/smart-web-fetch-safe/scripts/fetch.py "https://example.com/article" --remote --max-chars 3000

JSON 输出

python3 skills/smart-web-fetch-safe/scripts/fetch.py "https://example.com/article" --json

配置说明

环境变量(可选)

变量默认值说明
ALLOWED_DOMAINS*允许访问的域名,用逗号分隔
MAX_CHARS10000最大返回字符数
DEFAULT_MODElocal默认模式:local(本地)或 remote(远程)

白名单示例

export ALLOWED_DOMAINS="example.com,github.com,wikipedia.org"
export MAX_CHARS=5000
export DEFAULT_MODE=local

模式对比

特性本地解析 (local)远程清洗 (remote)
隐私✅ 完全本地⚠️ 数据经第三方
Token 优化✅ 基础优化✅ 深度优化 50-80%
速度较快依赖网络
依赖beautifulsoup4, requests无额外依赖

安装依赖

pip install beautifulsoup4 requests

当前状态

开发中。

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…