Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Url Reader

v0.1.1

智能读取任意URL内容,支持微信公众号、小红书、今日头条、抖音、淘宝、天猫、京东、百度等中国主流平台,自动识别平台类型并提取核心内容。自动保存内容为Markdown,下载图片到本地。

2· 1.3k·16 current·18 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for justao/url-reader.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Url Reader" (justao/url-reader) from ClawHub.
Skill page: https://clawhub.ai/justao/url-reader
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Canonical install target

openclaw skills install justao/url-reader

ClawHub CLI

Package manager switcher

npx clawhub@latest install url-reader
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The skill's declared purpose (read arbitrary URLs, extract core content, save Markdown and images) matches the included scripts: URL identification, Firecrawl/Jina/Playwright readers, and save_content. However the registry metadata claims no required environment variables or credentials while the code and SKILL.md document an optional FIRECRAWL_API_KEY and Playwright login-state use; that's an inconsistency. The code also requires installing Playwright/Chromium and optionally a Firecrawl client library — reasonable for the stated features but not reflected in the registry 'requires' section.
!
Instruction Scope
Runtime instructions and scripts direct the agent to: (1) call external reader services (Firecrawl, r.jina.ai), (2) launch Playwright/Chromium, (3) prompt the user to log in via a browser and save storage_state to data/wechat_auth.json, and (4) automatically save Markdown and download images to disk. These actions go beyond simple read-only queries: they persist authentication tokens (login state) and write files to disk using a hard-coded default path (/Users/ys/laoyang知识库/nickys/素材) shown in multiple places. The SKILL.md also instructs setting FIRECRAWL_API_KEY, which is not declared as a required env var in metadata.
Install Mechanism
There is no formal install spec, but SKILL.md instructs users to create a Python venv and pip install packages including 'firecrawl-py', 'requests' and 'playwright' and to run 'playwright install chromium' (which downloads browser binaries). This will write binaries and files to disk. The install sources are public package installs (pip) and Playwright's download — moderate risk and expected for this functionality; there are no unknown URL shorteners or arbitrary archive downloads in the install instructions.
!
Credentials
Registry metadata lists no required env vars, yet code and docs use FIRECRAWL_API_KEY (FIRECRAWL_API_KEY) for the preferred Firecrawl strategy. The skill also creates and stores Playwright 'storage_state' (WeChat login tokens) under the skill's data directory, which are sensitive credentials. The number and type of secrets (API key + browser auth state) are proportionate to the feature set, but the omission from the declared requirements and the automatic local persistence of login state are concerning and should be made explicit to the user.
Persistence & Privilege
The skill does not request 'always: true' and does not change other skills' configs. However it persists data to disk in two places: a hard-coded default output directory in the author's home path and a local data/wechat_auth.json for saved browser auth. Persisting auth tokens is normal for a reader that needs logged-in sessions, but the hard-coded user-specific path and lack of an opt-out or configurable default is problematic. Autonomous invocation is allowed (default), which combined with network access and file writes increases blast radius but is expected for this kind of skill.
What to consider before installing
What to consider before installing: - The skill does what it claims (scrape many Chinese platforms and save Markdown/images) but is sloppy about metadata: it does require an optional FIRECRAWL_API_KEY (documented in SKILL.md) and Playwright, even though the registry lists no env vars. Treat the Firecrawl API key as sensitive. - The skill will write files to disk: it uses a hard-coded default save directory (/Users/ys/laoyang知识库/nickys/素材). Edit the DEFAULT_OUTPUT_DIR in scripts/save_content.py and url_reader.py before use to point to a directory you control, or run the scripts from a confined/sandboxed environment. - For sites that require login (WeChat, Taobao, etc.), the skill uses Playwright to launch a browser and will save browser storage_state (cookies/session tokens) to data/wechat_auth.json inside the skill directory. Those files contain authentication data — review and store them securely, or avoid using the Playwright login flow if you don't want to persist credentials. - Installing Playwright will download Chromium binaries (playwright install chromium). Only proceed if you are comfortable with that and run the install in a controlled environment (virtualenv, container, or VM). - The skill contacts external services: Firecrawl (requires API key, paid tiers) and r.jina.ai (free). If you do not trust Firecrawl, do not set its API key; the skill will fallback to Jina/Playwright but with degraded behavior. - Because the repository owner and homepage are unknown, exercise extra caution. Recommended steps before installing: (1) edit DEFAULT_OUTPUT_DIR to a safe location, (2) confirm or remove automatic saving of wechat_auth.json if you do not want local credential persistence, (3) only provide FIRECRAWL_API_KEY if you trust the service and understand billing, and (4) run the tool inside a sandbox (container or VM) until you are comfortable with its behavior. If the author updated the package metadata to declare FIRECRAWL_API_KEY as an optional required env var, and replaced the hard-coded output path with a configurable default or documented prompt, my assessment would move toward 'benign'.

Like a lobster shell, security has layers — review code before you run it.

latestvk97c29pbganaa29jcmn4rkxm858129gg
1.3kdownloads
2stars
2versions
Updated 7h ago
v0.1.1
MIT-0

URL Reader - 智能网页内容读取器

一键读取任意URL的内容,自动识别平台类型,智能选择最佳读取策略,自动保存内容和图片到本地

默认保存目录

/Users/ys/laoyang知识库/nickys/素材/

保存格式:

素材/
└── 2026-01-30_文章标题/
    ├── content.md      # Markdown内容
    ├── img_01.webp     # 图片1
    ├── img_02.webp     # 图片2
    └── ...

核心技术方案

三层读取策略(自动降级)

┌─────────────────────────────────────────────────────────────────┐
│                     URL Reader 技术架构                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  用户输入 URL                                                    │
│       ↓                                                         │
│  ┌─────────────┐                                                │
│  │ 平台识别器   │ → 识别URL所属平台(微信/小红书/淘宝等)           │
│  └─────────────┘                                                │
│       ↓                                                         │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │                    策略选择器                                ││
│  │  ┌───────────┐  ┌───────────┐  ┌───────────┐               ││
│  │  │ 策略1     │  │ 策略2     │  │ 策略3     │               ││
│  │  │ Firecrawl │→│ Jina      │→│ Playwright │               ││
│  │  │ (首选)    │  │ (备选)    │  │ (兜底)    │               ││
│  │  └───────────┘  └───────────┘  └───────────┘               ││
│  └─────────────────────────────────────────────────────────────┘│
│       ↓                                                         │
│  ┌─────────────┐                                                │
│  │ 内容提取器   │ → 提取标题、正文、作者、时间等                   │
│  └─────────────┘                                                │
│       ↓                                                         │
│  ┌─────────────┐                                                │
│  │ 格式化输出   │ → Markdown 格式                                │
│  └─────────────┘                                                │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

策略1:Firecrawl API(首选)

特点

  • AI 驱动的网页抓取
  • 自动处理 JavaScript 渲染
  • 自动绕过反爬机制
  • 直接返回干净的 Markdown
  • 支持 96% 的网站

API 调用

from firecrawl import Firecrawl

app = Firecrawl(api_key="fc-YOUR_API_KEY")
result = app.scrape(url, formats=["markdown"])

定价

  • 免费:500 页/月
  • 付费:按量计费

策略2:Jina Reader API(备选)

特点

  • 完全免费
  • 无需 API Key
  • 支持动态渲染
  • 返回 Markdown 格式

使用方式

https://r.jina.ai/{原始URL}

策略3:Playwright 浏览器自动化(兜底)

特点

  • 支持登录态保持
  • 可处理任何网站
  • 需要首次手动登录

适用场景

  • 微信公众号(需要登录)
  • 需要登录的平台
  • 前两种策略都失败时

使用方式

方式1:直接对话

用户:帮我读取这个链接 https://mp.weixin.qq.com/s/xxxxx
用户:看看这个小红书 https://www.xiaohongshu.com/explore/xxxxx
用户:读一下这个网页 https://example.com/article

方式2:命令行调用

/url-reader https://example.com/article

支持的平台

平台域名推荐策略备注
微信公众号mp.weixin.qq.comFirecrawl → Playwright可能需要登录
小红书xiaohongshu.comFirecrawl → Jina短链接需解析
今日头条toutiao.comFirecrawl → Jina-
抖音douyin.comFirecrawl提取视频描述
淘宝taobao.comFirecrawl → Playwright可能需要登录
天猫tmall.comFirecrawl → Playwright可能需要登录
京东jd.comFirecrawl → Jina-
百度baidu.comFirecrawl → Jina-
知乎zhihu.comFirecrawl → Jina-
微博weibo.comFirecrawl → Playwright可能需要登录
B站bilibili.comFirecrawl → Jina-
通用网站*Firecrawl → Jina-

工作流程

1. 接收 URL
2. 识别平台类型
3. 选择读取策略:
   ├─ 尝试 Firecrawl API
   │   ├─ 成功 → 返回内容
   │   └─ 失败 → 继续
   ├─ 尝试 Jina Reader
   │   ├─ 成功 → 返回内容
   │   └─ 失败 → 继续
   └─ 尝试 Playwright(需要登录态)
       ├─ 有登录态 → 读取内容
       └─ 无登录态 → 提示用户设置
4. 提取核心内容
5. 格式化输出

输出格式

# [文章标题]

**来源**:[平台名称]
**作者**:[作者名称]
**发布时间**:[时间]
**原文链接**:[URL]

---

[正文内容]

---

**互动数据**(如有):
- 阅读/播放:xxx
- 点赞:xxx
- 评论:xxx

配置说明

Firecrawl API Key 配置

  1. 访问 https://www.firecrawl.dev/ 注册账号
  2. 获取 API Key
  3. 配置环境变量:
    export FIRECRAWL_API_KEY="fc-YOUR_API_KEY"
    

Playwright 登录态设置(可选)

用于需要登录的平台(如微信公众号):

cd ~/.claude/skills/url-reader
source .venv/bin/activate
python scripts/wechat_reader.py setup

目录结构

url-reader/
├── skill.md              # 本文档
├── metadata.json         # 元数据
├── scripts/
│   ├── url_reader.py     # 主读取器(整合三种策略)
│   ├── firecrawl_reader.py   # Firecrawl 策略
│   ├── jina_reader.py        # Jina 策略
│   ├── wechat_reader.py      # Playwright 策略(微信)
│   └── url_identifier.py     # URL 平台识别器
└── data/
    └── wechat_auth.json  # 微信登录态(自动生成)

依赖安装

cd ~/.claude/skills/url-reader
python3 -m venv .venv
source .venv/bin/activate

# 核心依赖
pip install firecrawl-py requests

# Playwright(可选,用于需要登录的平台)
pip install playwright
playwright install chromium

常见问题

Q: 为什么有些网站读取失败?

A: 可能原因:

  1. 网站有强反爬机制 → 尝试 Playwright
  2. 需要登录 → 设置登录态
  3. 内容已删除 → 无法读取

Q: Firecrawl 免费额度用完了怎么办?

A:

  1. 自动降级到 Jina Reader(免费)
  2. 或升级 Firecrawl 付费计划

Q: 微信公众号总是读取失败?

A: 微信反爬最严格,建议:

  1. 使用 Playwright + 登录态
  2. 或手动复制内容

版本历史

  • v2.0:整合 Firecrawl + Jina + Playwright 三层策略
  • v1.1:添加 Playwright 浏览器自动化
  • v1.0:基础功能

Comments

Loading comments...