HTML OCR

Security checks across static analysis, malware telemetry, and agentic risk

Overview

This is a coherent HTML OCR skill; users should mainly notice that it installs an external MinerU CLI and uses a MinerU API token to process local HTML content.

This skill appears safe for its stated purpose, but install the MinerU CLI from a trusted source, use a revocable MinerU token, and avoid processing private or regulated HTML content unless you are comfortable with MinerU handling that data.

Static analysis

No static analysis findings were reported for this release.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

#ASI04: Agentic Supply Chain Vulnerabilities

Low

What this means

Installing the skill may add a global third-party command-line tool to the user's environment.

Why it was flagged

The skill depends on externally installed CLI packages, including an unpinned Go '@latest' install and npm's default latest resolution. This is central to the skill's purpose, but users should trust the package source and version.

Skill content

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Recommendation

Install only from trusted package sources, consider pinning a known-good version where possible, and review the MinerU package provenance before use.

#ASI03: Identity and Privilege Abuse

Low

What this means

The agent or CLI can use the configured MinerU token while performing OCR tasks.

Why it was flagged

The skill requires a MinerU API token. This is expected for a MinerU OCR integration and is declared in the artifact, but it still grants access to a third-party service account or quota.

Skill content

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Recommendation

Use a dedicated or revocable MinerU token, avoid exposing it in shared logs or prompts, and revoke it if the skill is no longer used.

#ASI07: Insecure Inter-Agent Communication

Low

What this means

Private HTML files or embedded images could be submitted for OCR processing through MinerU when the command is used.

Why it was flagged

The documented workflow uses the MinerU open API CLI with a token to process a local HTML file. This is consistent with the skill's OCR purpose, but local HTML pages and embedded images may contain sensitive content that is processed by the external MinerU service.

Skill content

mineru-open-api extract page.html --ocr -o ./out/

Recommendation

Do not use this skill on confidential pages unless MinerU's data handling, retention, and privacy terms are acceptable for that content.