Web Extractor
使用 jina.ai 提取网页干净文本并让 Agent 总结。触发词:提取网页、总结新闻、提取文章、获取页面内容
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 121 · 0 current installs · 0 all-time installs
by@kukuxnd
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
The name/description match the instructions: the SKILL.md tells the agent to fetch a page via r.jina.ai and summarize the resulting markdown. No unrelated binaries, installs, or credentials are requested.
Instruction Scope
The instructions instruct the agent to POST the target page URL to an external service (https://r.jina.ai/...) and save the result to /tmp, then read and summarize that file. This is within the stated function but has privacy/security implications that the skill does not address: arbitrary URLs (including internal intranet or metadata endpoints) will be fetched by the remote service, and page contents are disclosed to a third party. The instructions also use a predictable /tmp filename, which can create local information exposure or race conditions.
Install Mechanism
Instruction-only skill with no install spec and no code files — nothing is written to disk by an installer. Lowest install risk.
Credentials
The skill requests no environment variables, credentials, or config paths. There is no overbroad credential access declared.
Persistence & Privilege
The skill does not request permanent presence (always: false) and does not modify agent/system configs. Agent-autonomous invocation is allowed by default, which is expected and not by itself a red flag.
What to consider before installing
This skill behaves as advertised (it delegates extraction to r.jina.ai then summarizes the returned markdown), but it will cause your requested URL and the fetched page content to be fetched and processed by a third-party service. Before installing or using it, consider:
- Do not send sensitive, private, or internal URLs (intranets, private docs, or cloud metadata endpoints like 169.254.169.254) — doing so can leak secrets or enable SSRF via the remote extractor.
- Treat r.jina.ai as an external party: any content fetched for summarization will be disclosed to them. Verify you trust that service or host an extractor locally.
- The skill writes to predictable /tmp filenames; if you must use it, prefer changing the workflow to use a secure temporary filename (e.g., mktemp) to avoid collisions or exposure.
- If you need to summarize protected content, fetch the page locally (ensuring credentials are handled safely), sanitize/remove sensitive headers or query params, and run a local extraction/parsing step instead of sending the raw URL to a public extractor.
If you want a safer alternative, ask for a version that accepts raw HTML you provide (so you control what is sent externally) or for instructions to run a local HTML-to-text tool rather than delegating fetching to r.jina.ai.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
网页内容提取技能
使用 r.jina.ai 提取网页干净文本,过滤垃圾代码,提取干货内容。
工作流程
1. 提取网页内容
# 提取网页并保存为 md 文件
curl -s https://r.jina.ai/<URL> > /tmp/web-content.md
2. 读取并总结
让 Agent 读取生成的 md 文件,总结核心观点。
使用示例
用户说:"帮我总结这个新闻 https://www.bbc.com/news/tech..."
执行:
curl -s "https://r.jina.ai/https://www.bbc.com/news/technology-xxx" > /tmp/news.md
然后读取 /tmp/news.md 文件内容,分析并总结。
注意事项
- r.jina.ai 会过滤掉 script、nav、广告 CSS 等垃圾代码
- 提取后是极干净的纯文本,对 AI 零负担
- 支持任何新闻网站、技术博客、文章页面
- 文件默认保存到 /tmp/ 目录,可自定义路径
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
