Web Extractor

PassAudited by ClawScan on May 1, 2026.

Overview

This skill appears benign and purpose-aligned, but it sends requested webpage URLs/content to r.jina.ai and stores extracted text in temporary local files.

This skill is reasonable for summarizing public webpages. Before installing or using it, remember that URLs are processed by r.jina.ai and extracted text is saved under /tmp, so avoid private, authenticated, or confidential pages unless that sharing and local storage are acceptable.

Findings (3)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

A hostile webpage could try to influence the agent if the agent treats page text as instructions instead of source material.

Why it was flagged

The skill places arbitrary webpage text into the agent context for summarization. This is purpose-aligned, but webpage content can contain prompt-like text and should be handled as untrusted content.

Skill content
让 Agent 读取生成的 md 文件,总结核心观点。
Recommendation

Use the extracted text only as content to summarize; do not follow instructions found inside the fetched page unless the user explicitly asks.

What this means

The agent may make network requests and create local files when summarizing a webpage.

Why it was flagged

The documented workflow uses a shell command to fetch a user-provided URL through r.jina.ai and write the result to a local file. This matches the skill purpose and is not hidden or destructive.

Skill content
curl -s https://r.jina.ai/<URL> > /tmp/web-content.md
Recommendation

Confirm the URL and output path for sensitive tasks, and avoid using this workflow for private or authenticated pages unless that sharing is intended.

What this means

Private URLs, intranet links, or sensitive page contents could be exposed to the external extraction service if used with this skill.

Why it was flagged

The skill relies on the external r.jina.ai service to retrieve and transform webpage content. This is disclosed and central to the purpose, but it means the requested URL and resulting page access are handled by a third party.

Skill content
使用 r.jina.ai 提取网页干净文本
Recommendation

Use it primarily for public webpages, or choose an approved/self-hosted extraction method for sensitive content.