HTML Extract
v0.4.0Extract content from HTML pages and files using MinerU. Converts HTML to clean, structured Markdown preserving headings, lists, tables, and text hierarchy. F...
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
The name/description (HTML extraction via MinerU) align with the declared runtime requirement (mineru-open-api) and the single required env var (MINERU_TOKEN). Requiring a MinerU CLI and token is expected for this functionality.
Instruction Scope
SKILL.md contains explicit commands using mineru-open-api (extract, crawl) and only references local HTML files, URLs, and the MINERU_TOKEN. It does not instruct reading unrelated system files, other environment variables, or exfiltrating data to unexpected endpoints.
Install Mechanism
Installers are npm (mineru-open-api) and go install from the GitHub repo — these are standard package sources. Installing third-party packages runs remote code at install/runtime, so verify the npm package and GitHub repository are the legitimate MinerU project before installing.
Credentials
Only one credential (MINERU_TOKEN) is required and is declared as primaryEnv. This is proportionate to a CLI that calls a remote MinerU API. No unrelated secrets or broad filesystem config paths are requested.
Persistence & Privilege
The skill does not request always:true or other elevated persistence. It is user-invocable and allows normal autonomous invocation, which is the platform default and reasonable for this capability.
Assessment
This skill is internally consistent with its stated purpose, but you should verify the mineru-open-api package before installing: check the npm package page and the GitHub repo linked from the MinerU homepage (https://mineru.net / https://github.com/opendatalab). Treat MINERU_TOKEN as a secret (do not reuse highly privileged credentials), create a token with least privilege if possible, and rotate it if you later stop using the skill. If you're cautious, install the CLI in an isolated environment (container or VM) and inspect its behavior (requests it makes) before using with sensitive data.Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
📄 Clawdis
Binsmineru-open-api
EnvMINERU_TOKEN
Primary envMINERU_TOKEN
Install
Install via npm
Bins: mineru-open-api
npm i -g mineru-open-apiInstall via go install
Bins: mineru-open-api
SKILL.md
HTML Extract
Extract text and content from local HTML files to Markdown using MinerU. For live web page URLs, use mineru-open-api crawl.
Install
npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
Quick Start
# Extract from a local HTML file (requires token)
mineru-open-api extract page.html -o ./out/
# Extract from a remote HTML URL (requires token)
mineru-open-api extract https://example.com/page.html -o ./out/
# Extract web page content via crawl (requires token)
mineru-open-api crawl https://example.com/article -o ./out/
# With language hint
mineru-open-api extract page.html --language en -o ./out/
Authentication
Token required:
mineru-open-api auth # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable
Create token at: https://mineru.net/apiManage/token
Capabilities
- Supported input: local .html file or remote HTML URL
- HTML requires
extract(token required) — not supported byflash-extract - For live web pages, use
mineru-open-api crawl <URL>(also requires token) - Language hint with
--language(default:ch, useenfor English)
Notes
- HTML is NOT supported by
flash-extract— always useextractorcrawl - Output goes to stdout by default; use
-o <dir>to save to a file or directory - All progress/status messages go to stderr; document content goes to stdout
- MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
