Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

web-reader-pro

v1.0.0

Advanced web content extraction skill for OpenClaw using multi-tier fallback strategy (Jina → Scrapling → WebFetch) with intelligent routing, caching, qualit...

2· 113·0 current·0 all-time
byJialin@0xcjl

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for 0xcjl/web-reader-pro.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "web-reader-pro" (0xcjl/web-reader-pro) from ClawHub.
Skill page: https://clawhub.ai/0xcjl/web-reader-pro
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install web-reader-pro

ClawHub CLI

Package manager switcher

npx clawhub@latest install web-reader-pro
Security Scan
VirusTotalVirusTotal
Suspicious
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The name and SKILL.md describe a web extraction tool (Jina → Scrapling → WebFetch) which aligns with the included python code and an install script for Scrapling. However the registry metadata claims 'required env vars: none' while SKILL.md (and code) reference sensitive/important environment variables (e.g., JINA_API_KEY, WEB_READER_CACHE_DIR, WEB_READER_LEARNING_DB, WEB_READER_JINA_QUOTA). This metadata mismatch is an incoherence that reduces trust and should be corrected or explained.
Instruction Scope
SKILL.md instructs installing dependencies (pip), running the included install_scrapling.sh (which uses npm/npx), and creating/using persistent local paths (~/.openclaw/*) for cache, quota, and learned routes. These behaviors are consistent with the stated purpose (cache, persistent domain routing). There are no instructions to read unrelated system secrets or arbitrary files, but the code will write/read files under the user's home directory and may create a wrapper in ~/.local/bin.
!
Install Mechanism
There is no platform install spec, but the repository includes scripts/install_scrapling.sh which performs global npm installs and creates a wrapper that invokes 'npx --yes scrapling'. Using npx --yes executes code fetched from the npm registry at runtime, which is a supply-chain/execution risk. The installer also attempts global npm installs (npm install -g), which modifies the system-wide node/npm environment. These install steps are expected for a Node-based scraper but are higher risk than pure Python deps and should be reviewed before execution.
Credentials
The skill requires a Jina API key for Tier 1 (JINA_API_KEY) and defines other environment variables for cache and quota. Requesting a service API key for the tiered Jina integration is proportionate. However, the registry metadata lists no required env vars while the SKILL.md and code expect them — this mismatch is problematic. No unrelated credentials appear requested, but the presence of JINA_API_KEY (sensitive) means users should confirm the skill's network calls and where data is sent.
Persistence & Privilege
The skill persists state locally (cache, jina_quota.json, domain routing JSON) under ~/.openclaw which is consistent with its learning/caching features. It also installs a wrapper into ~/.local/bin (if using the provided installer) and suggests adding that path to shell rc files. The skill does not request 'always: true' or global agent modifications beyond its own files. Persisting local data is expected, but users should be aware of files created in their home directory.
What to consider before installing
Key points to consider before installing or running this skill: - Metadata mismatch: The registry claims no required environment variables but SKILL.md and the code expect JINA_API_KEY and other WEB_READER_* vars. Treat the SKILL.md as authoritative unless the publisher clarifies the registry record. - Sensitive credential: JINA_API_KEY is required for Tier 1. Only provide that key if you trust the skill and have reviewed the code paths that send data to Jina's API. - Installer risk (npm / npx): The included scripts install npm packages globally and create a wrapper that runs 'npx --yes scrapling'. npx downloads and executes packages from the npm registry at runtime which can run arbitrary code. If you don't trust the upstream npm package or the author, avoid running the installer or run it in a contained environment (VM/container) and review the installed package contents first. - Persistent local files: The skill writes cache, quota counters, and learned routing JSON under ~/.openclaw. Review and/or sandbox these files if you are concerned about persisted data. - Code review recommended: Although behavior is broadly consistent with a web extractor, review scripts/web_reader_pro.py for any hardcoded endpoints, logging of sensitive data, telemetry, or unexpected network calls before supplying secrets. - Safer options: If you want to test, run inside an isolated environment (container or throwaway VM), do not provide production API keys (use test keys), and inspect network traffic or code behavior before enabling on a production agent. If you want, I can: (1) scan the remainder of web_reader_pro.py for any network endpoints or hardcoded URLs, (2) point out exact paths/files the skill will create, or (3) produce a safe installation checklist to minimize risk.

Like a lobster shell, security has layers — review code before you run it.

latestvk973qjs5cdg374f2g7kc4jdjpd83hpzn
113downloads
2stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

Web Reader Pro - OpenClaw Skill

Overview

Web Reader Pro is an advanced web content extraction skill for OpenClaw that uses a multi-tier fallback strategy with intelligent routing, caching, and quality assessment.

Features

1. Three-Tier Fallback Strategy

  • Tier 1: Jina Reader API - Fast, reliable, best for most websites
  • Tier 2: Scrapling + Playwright - Dynamic content rendering for JS-heavy sites
  • Tier 3: WebFetch Fallback - Basic extraction for simple pages

2. Jina Quota Monitoring

  • Tracks API call count with persistent counter
  • Warning alerts when approaching quota limits
  • Automatic fallback to lower-tier methods when quota exhausted

3. Smart Cache Layer

  • Short-term caching (configurable TTL, default 1 hour)
  • Cache key based on URL hash
  • Reduces redundant API calls

4. Extraction Quality Scoring

  • Scores based on: word count, title detection, content density
  • Minimum quality threshold (default: 200 words + valid title)
  • Auto-escalation to next tier if quality below threshold

5. Domain-Level Routing Learning

  • Learns optimal extraction tier per domain
  • Persists learned routes in local JSON database
  • Adapts based on historical success rates

6. Retry with Exponential Backoff

  • Configurable max retries per tier (default: 3)
  • Exponential backoff: 1s, 2s, 4s, 8s...
  • Respects rate limits and transient failures

Installation

# Install dependencies
pip install -r requirements.txt

# Install Scrapling (requires Node.js)
./scripts/install_scrapling.sh

# Or install Scrapling manually
npm install -g @scrapinghub/scrapling

Usage

Basic Usage

from scripts.web_reader_pro import WebReaderPro

reader = WebReaderPro()
result = reader.fetch("https://example.com")
print(result['title'])
print(result['content'])

Advanced Configuration

reader = WebReaderPro(
    jina_api_key="your-jina-key",      # Optional: set via env JINA_API_KEY
    cache_ttl=3600,                      # Cache TTL in seconds (default: 3600)
    quality_threshold=200,               # Min word count for quality (default: 200)
    max_retries=3,                       # Max retries per tier (default: 3)
    enable_learning=True,                # Enable domain learning (default: True)
    scrapling_path="/usr/local/bin/scrapling"  # Path to scrapling binary
)

Result Format

{
    "title": "Page Title",
    "content": "Extracted content in markdown...",
    "url": "https://example.com",
    "tier_used": "jina|scrapling|webfetch",
    "quality_score": 85,
    "cached": False,
    "domain_learned_tier": "jina",
    "extracted_at": "2024-01-01T00:00:00Z"
}

Environment Variables

VariableDescriptionDefault
JINA_API_KEYJina Reader API keyRequired for Tier 1
WEB_READER_CACHE_DIRCache directory path~/.openclaw/cache/web-reader-pro/
WEB_READER_LEARNING_DBLearning database path~/.openclaw/data/web-reader-pro/routes.json
WEB_READER_JINA_QUOTAJina quota limit100000

API Reference

WebReaderPro.fetch(url, force_refresh=False)

Fetch and extract content from a URL.

Parameters:

  • url (str): Target URL
  • force_refresh (bool): Bypass cache if True

Returns: Dict with title, content, metadata

WebReaderPro.fetch_with_tier(url, preferred_tier)

Fetch using a specific tier (bypassing automatic selection).

Parameters:

  • url (str): Target URL
  • preferred_tier (str): "jina", "scrapling", or "webfetch"

WebReaderPro.get_jina_status()

Get current Jina API quota usage.

Returns: Dict with count, limit, percentage, warnings

WebReaderPro.clear_cache(url=None)

Clear cache for specific URL or all URLs.

Parameters:

  • url (str, optional): Specific URL to clear, or None for all

WebReaderPro.get_domain_routes()

Get learned domain-to-tier mappings.

Returns: Dict of domain -> preferred tier

Tier Comparison

TierSpeedJS RenderingBest ForCost
JinaFastNoStatic pages, articlesAPI calls
ScraplingMediumYesSPAs, dynamic contentCPU
WebFetchFastestNoSimple pages, fallbacksFree

License

MIT

Comments

Loading comments...