HTML OCR
Security checks across static analysis, malware telemetry, and agentic risk
Overview
This is a coherent HTML OCR skill; users should mainly notice that it installs an external MinerU CLI and uses a MinerU API token to process local HTML content.
This skill appears safe for its stated purpose, but install the MinerU CLI from a trusted source, use a revocable MinerU token, and avoid processing private or regulated HTML content unless you are comfortable with MinerU handling that data.
Static analysis
No static analysis findings were reported for this release.
VirusTotal
VirusTotal findings are pending for this skill version.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Installing the skill may add a global third-party command-line tool to the user's environment.
The skill depends on externally installed CLI packages, including an unpinned Go '@latest' install and npm's default latest resolution. This is central to the skill's purpose, but users should trust the package source and version.
npm install -g mineru-open-api # or via Go (macOS/Linux): go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
Install only from trusted package sources, consider pinning a known-good version where possible, and review the MinerU package provenance before use.
The agent or CLI can use the configured MinerU token while performing OCR tasks.
The skill requires a MinerU API token. This is expected for a MinerU OCR integration and is declared in the artifact, but it still grants access to a third-party service account or quota.
Token required: mineru-open-api auth # Interactive token setup export MINERU_TOKEN="your-token" # Or via environment variable
Use a dedicated or revocable MinerU token, avoid exposing it in shared logs or prompts, and revoke it if the skill is no longer used.
Private HTML files or embedded images could be submitted for OCR processing through MinerU when the command is used.
The documented workflow uses the MinerU open API CLI with a token to process a local HTML file. This is consistent with the skill's OCR purpose, but local HTML pages and embedded images may contain sensitive content that is processed by the external MinerU service.
mineru-open-api extract page.html --ocr -o ./out/
Do not use this skill on confidential pages unless MinerU's data handling, retention, and privacy terms are acceptable for that content.
