HTML to Text

v0.4.0

Convert HTML to plain readable text using MinerU. Strips HTML markup and extracts clean text content from web pages and HTML files. Features: HTML to text co...

0· 58·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The skill is an instruction-only wrapper to run the mineru-open-api CLI to extract text from HTML/URLs. Requiring the mineru-open-api binary and MINERU_TOKEN is consistent with that purpose.
Instruction Scope
SKILL.md only instructs using mineru-open-api commands (extract, crawl, auth), creating/setting MINERU_TOKEN, and saving outputs. It does not ask the agent to read unrelated files, other env vars, or exfiltrate data to unexpected endpoints.
Install Mechanism
Installers are standard package flows: npm package and go install from a GitHub repo. No arbitrary downloads, no URL shorteners or unknown extract steps are used.
Credentials
Only MINERU_TOKEN is required and declared as the primary credential. That single token is proportional to a CLI that authenticates to MinerU's API.
Persistence & Privilege
always is false and the skill does not request system-wide changes or other skills' config. Autonomous invocation is allowed (platform default) but not excessive for this integration.
Assessment
This skill is coherent: it simply wraps the MinerU CLI and needs a MinerU API token. Before installing, verify the mineru-open-api package source (npm package name and the GitHub repo) to ensure it's the official project, obtain your MINERU_TOKEN only from the official mineru.net site, and avoid pasting that token into untrusted places. Installing globally (-g) will add a system-wide binary; use a virtualenv/container if you prefer isolation. If you plan to allow the agent to call this skill autonomously, be aware it can run the mineru-open-api commands whenever invoked — ensure you trust the agent and the token's permissions.

Like a lobster shell, security has layers — review code before you run it.

latestvk977y0fzsypb323phyx878zm65845dxb

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

📄 Clawdis
Binsmineru-open-api
EnvMINERU_TOKEN
Primary envMINERU_TOKEN

Install

Install via npm
Bins: mineru-open-api
npm i -g mineru-open-api
Install via go install
Bins: mineru-open-api

SKILL.md

HTML to Text

Extract plain readable text from HTML files or web pages using MinerU. MinerU outputs Markdown as the closest format to plain text.

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract text from a local HTML file (requires token)
mineru-open-api extract page.html -o ./out/

# Extract text from a web page (requires token)
mineru-open-api crawl https://example.com/article

# JSON output contains text fields (requires token)
mineru-open-api extract page.html -f json -o ./out/

Authentication

Token required:

mineru-open-api auth             # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supported input: local .html file or web page URL
  • HTML requires extract or crawl (token required) — not supported by flash-extract
  • MinerU does not have a -f text option; Markdown is the closest plain-text output
  • For truly plain text: use extract -f json and read the text fields from JSON output
  • Language hint with --language (default: ch, use en for English)

Notes

  • MinerU has no -f text format; use Markdown output or -f json for text fields
  • HTML is NOT supported by flash-extract
  • Output goes to stdout by default; use -o <dir> to save to a file or directory
  • All progress/status messages go to stderr; document content goes to stdout
  • MinerU is open-source by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…