Install
openclaw skills install @mzlzyca/html-to-htmlClean and restructure HTML documents using MinerU. Takes messy or complex HTML and produces clean, well-formatted HTML output with proper structure preserved. Features: HTML cleanup and restructuring. Removes unnecessary markup and noise. Preserves core content structure. Produces clean HTML from cluttered web pages. Use when you need to: clean up messy HTML, restructure an HTML document, convert complex HTML to clean HTML, sanitize HTML content. Use when asked: 'how do I clean this HTML', 'make this HTML cleaner', 'I want clean HTML from this page', 'can my agent clean up HTML', 'is there a skill for HTML cleanup', 'restructure this messy HTML'. Built on MinerU by OpenDataLab (Shanghai AI Lab), an open-source document intelligence engine. Great for web developers, content migration teams, and anyone who needs to clean up HTML from legacy systems, CMS exports, or messy web scraping results.
openclaw skills install @mzlzyca/html-to-htmlFetch a remote web page or local HTML file and convert it to clean structured HTML using MinerU. Strips noise and preserves semantic content.
npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest
# Crawl a web page and output clean HTML (requires token)
mineru-open-api crawl https://example.com/article -f html -o ./out/
# Re-extract a local HTML file to clean HTML (requires token)
mineru-open-api extract page.html -f html -o ./out/
# Batch crawl multiple URLs to HTML (requires token)
mineru-open-api crawl url1 url2 -f html -o ./pages/
Token required:
mineru-open-api auth # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable
Create token at: https://mineru.net/apiManage/token
-f html)crawl -f htmlextract -f htmlflash-extract-f html) requires token; not available in flash-extractcrawl supports output formats: md, html, jsonextract supports output formats: md, html, latex, docx, json-o <dir> to save to a file or directory