Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

ade-mineru-api-skills

MinerU document extraction CLI that converts PDFs, images, and web pages into Markdown, HTML, LaTeX, or DOCX via the MinerU API. Supports single/batch extrac...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 36 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
The skill claims to wrap the MinerU document-extraction CLI and the SKILL.md only references the 'mineru' binary and MinerU API token. Requiring the mineru binary and showing usage of mineru commands is coherent with the stated purpose.
Instruction Scope
Runtime instructions focus on using the mineru CLI and configuring MINERU_TOKEN; they do not ask the agent to access unrelated files or credentials. However the Quick install instructs running curl to download a binary and moving it into /usr/local/bin (requiring sudo), which is an installation action outside pure runtime usage and carries risk if the binary origin is not trusted.
!
Install Mechanism
The install instructions (and metadata) use direct downloads from https://webpub.shlab.tech — a non-standard/unfamiliar host — producing a native executable that the user is instructed to place in /usr/local/bin. There is no checksum, signature, GitHub/GitLab release, or other provenance provided. Downloading and installing unsigned binaries from a personal or unknown CDN is high risk.
Credentials
The skill requests no environment variables in metadata. The SKILL.md documents the expected MINERU_TOKEN for authentication and the usual resolution order (flag > MINERU_TOKEN env > ~/.mineru/config.yaml). Requesting an API token for a CLI that calls a remote API is proportionate. Users should note that the token may be stored locally (~/.mineru/config.yaml) and is sensitive.
Persistence & Privilege
The skill is not marked always:true and does not request persistent platform-level privileges. The only persistence-like action it documents is installing a binary into PATH, which is normal for a CLI wrapper but increases impact if the binary is malicious.
What to consider before installing
This skill appears to be a wrapper for a CLI and behaves consistently with that purpose, but it asks you to download and install a native binary from an unfamiliar host without checks. Before installing: (1) prefer an official release (GitHub/GitLab or your distribution's package manager) or a containerized image; (2) ask the publisher for a checksum or GPG signature and verify it matches the downloaded binary; (3) avoid running curl | sudo directly — download to a temporary location first, inspect, and run in a sandbox or VM; (4) limit where you run the binary (not as root) and review network activity if possible; (5) treat MINERU_TOKEN as sensitive and rotate it if you test the binary and suspect anything. If the publisher cannot provide verifiable provenance (official homepage, signed release, or checksums), consider the install too risky.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97fq83sxsnqm5azv417pjw421831y3g

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

📄 Clawdis
Binsmineru

SKILL.md

Document Extraction with mineru

Installation

Quick install (recommended)

Auto-detect OS and architecture, download binary from CDN:

curl -fsSL "https://webpub.shlab.tech/MinerU/test/ade-mineru-api-cli/mineru-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m | sed 's/x86_64/amd64/;s/aarch64/arm64/')" -o /usr/local/bin/mineru && chmod +x /usr/local/bin/mineru

Manual download

Pick the binary for your platform:

PlatformCommand
macOS Apple Siliconcurl -fsSL https://webpub.shlab.tech/MinerU/test/ade-mineru-api-cli/mineru-darwin-arm64 -o mineru
macOS Intelcurl -fsSL https://webpub.shlab.tech/MinerU/test/ade-mineru-api-cli/mineru-darwin-amd64 -o mineru
Linux x86_64curl -fsSL https://webpub.shlab.tech/MinerU/test/ade-mineru-api-cli/mineru-linux-amd64 -o mineru
Linux ARM64curl -fsSL https://webpub.shlab.tech/MinerU/test/ade-mineru-api-cli/mineru-linux-arm64 -o mineru

Then move into PATH:

chmod +x mineru
sudo mv mineru /usr/local/bin/

Verify installation

mineru version

Authentication

Before using, configure your API token (get one from https://mineru.net):

mineru auth                    # Interactive token setup
export MINERU_TOKEN="your-token"  # Or set via environment variable

Token resolution order: --token flag > MINERU_TOKEN env > ~/.mineru/config.yaml.

Supported input formats

The extract command accepts the following input types:

  • PDF (.pdf) — primary use case, supports scanned and digital PDFs
  • Images (.png, .jpg, .jpeg, .webp, .gif,.bmp) — use --ocr for best results on scanned content
  • DOCX (.docx) — Microsoft Word documents
  • URLs — remote files are downloaded automatically

The crawl command accepts any HTTP/HTTPS URL and extracts web page content.

Default behavior

  • Table recognition: ON by default. Tables in documents are extracted and converted to Markdown tables. Use --no-table to disable.
  • Formula recognition: ON by default. Mathematical formulas are extracted as LaTeX. Use --no-formula to disable.
  • Language: defaults to ch (Chinese). Use --language en for English documents.
  • Model: auto-selected. Use --model vlm for complex layouts, --model pipeline for speed.

Quick start

mineru extract report.pdf                    # PDF → Markdown to stdout
mineru extract report.pdf -o ./out/          # Save to file
mineru extract report.pdf -f md,docx         # Multiple formats
mineru crawl https://example.com/article     # Web page → Markdown

Core workflow

  1. Authenticate: mineru auth or set MINERU_TOKEN
  2. Extract: mineru extract <file-or-url> for documents
  3. Crawl: mineru crawl <url> for web pages
  4. Check results: output goes to stdout (default) or -o directory

Commands

extract — Document extraction

Convert PDFs, images, and other documents to Markdown or other formats.

mineru extract report.pdf                         # Markdown to stdout
mineru extract report.pdf -f html                 # HTML to stdout
mineru extract report.pdf -o ./out/               # Save to directory
mineru extract report.pdf -o ./out/ -f md,docx    # Multiple formats
mineru extract *.pdf -o ./results/                # Batch extract
mineru extract --list files.txt -o ./results/     # Batch from file list
mineru extract https://example.com/doc.pdf        # Extract from URL
cat doc.pdf | mineru extract --stdin -o ./out/    # From stdin

extract flags

FlagShortDefaultDescription
--output-o(stdout)Output path (file or directory)
--format-fmdOutput formats: md, json, html, latex, docx (comma-separated)
--model(auto)Model: vlm, pipeline, html
--ocrfalseEnable OCR for scanned documents
--no-formulafalseDisable formula recognition
--no-tablefalseDisable table recognition
--languagechDocument language
--pages(all)Page range, e.g. 1-10,15
--timeout300/1800Timeout in seconds (single/batch)
--listRead input list from file (one path per line)
--stdin-listfalseRead input list from stdin
--stdinfalseRead file content from stdin
--stdin-namestdin.pdfFilename hint for stdin mode
--concurrency0Batch concurrency (0 = server default)

crawl — Web page extraction

Fetch web pages and convert to Markdown.

mineru crawl https://example.com/article              # Markdown to stdout
mineru crawl https://example.com/article -f html      # HTML to stdout
mineru crawl https://example.com/article -o ./out/     # Save to file
mineru crawl url1 url2 -o ./pages/                     # Batch crawl
mineru crawl --list urls.txt -o ./pages/               # Batch from file list

crawl flags

FlagShortDefaultDescription
--output-o(stdout)Output path
--format-fmdOutput formats: md, json, html (comma-separated)
--timeout300/1800Timeout in seconds (single/batch)
--listRead URL list from file (one per line)
--stdin-listfalseRead URL list from stdin
--concurrency0Batch concurrency

auth — Authentication management

mineru auth              # Interactive token setup
mineru auth --verify     # Verify current token is valid
mineru auth --show       # Show current token source and masked value

status — Async task status

Query the status of a previously submitted extraction task.

mineru status <task-id>                      # Check status once
mineru status <task-id> --wait               # Wait for completion
mineru status <task-id> --wait -o ./out/     # Wait and download results
mineru status <task-id> --wait --timeout 600 # Custom timeout

status flags

FlagShortDefaultDescription
--waitfalseWait for task completion
--output-oDownload results to directory when done
--timeout300Max wait time in seconds

version — Version info

mineru version    # Show version, commit, build date, Go version, OS/arch

Global flags

These flags apply to all commands:

FlagShortDescription
--tokenAPI token (overrides env and config)
--base-urlAPI base URL (for private deployments)
--verbose-vVerbose mode, print HTTP details

Output behavior

  • No -o flag: result goes to stdout; status/progress messages go to stderr
  • With -o flag: result saved to file/directory; progress messages on stderr
  • Batch mode: requires -o to specify output directory
  • Binary formats (docx): cannot output to stdout, must use -o
  • Markdown output includes extracted images saved alongside the .md file

Examples

Single PDF extraction

mineru extract report.pdf -o ./output/
# Output: ./output/report.md + ./output/images/

Extract with OCR and specific pages

mineru extract scanned.pdf --ocr --pages "1-5" -o ./out/

Multi-format output

mineru extract paper.pdf -f md,html,docx -o ./out/
# Output: ./out/paper.md, ./out/paper.html, ./out/paper.docx

Batch processing from file list

# files.txt contains one path per line
mineru extract --list files.txt -o ./results/

Extract to LaTeX

mineru extract paper.pdf -f latex -o ./out/
# Output: ./out/paper.tex

English document with specific language

mineru extract english-report.pdf --language en -o ./out/

Extract Word document to Markdown

mineru extract resume.docx -o ./out/
# Output: ./out/resume.md

Pipe workflow

# Download and extract in one pipeline
curl -sL https://example.com/doc.pdf | mineru extract --stdin --stdin-name doc.pdf

Web crawling

mineru crawl https://example.com/docs/guide -o ./docs/

Batch crawl with URL list

echo -e "https://example.com/page1\nhttps://example.com/page2" | mineru crawl --stdin-list -o ./pages/

Use with other tools

# Extract and pipe to another tool
mineru extract report.pdf | wc -w              # Word count
mineru extract report.pdf | grep "keyword"     # Search content
mineru extract report.pdf -f json | jq '.[]'   # Parse structured output

Agent guidelines

When using this skill on behalf of the user:

  • Always ask for the file path if the user didn't specify one. Never guess or fabricate a filename.
  • Ask for output directory when the user says "save" or "保存" but didn't specify where. Default suggestion: ./out/.
  • Don't run commands blindly on errors — if the user asks "提取失败了怎么办", explain the exit code and troubleshooting steps instead of re-running the command.
  • Installation questions ("mineru 怎么安装") should be answered with the install instructions, not by running mineru extract.
  • DOCX as input is supported — if the user asks "这个 Word 文档能转 Markdown 吗", use mineru extract file.docx.
  • Table extraction — tables are extracted by default as part of the Markdown output. There is no "tables only" mode; the full document is always extracted.
  • For stdout mode (no -o), only one text format can be output at a time. If the user wants multiple formats, suggest adding -o.

Exit codes

CodeMeaningRecovery
0Success
1General API or unknown errorCheck network connectivity; retry; use --verbose for details
2Invalid parameters / usage errorCheck command syntax and flag values
3Authentication errorRun mineru auth to reconfigure token, or check token expiration
4File too large or page limit exceededSplit the file or use --pages to extract a subset
5Extraction failedThe document may be corrupted or unsupported; try a different --model
6TimeoutIncrease with --timeout; large files may need 600+ seconds
7Quota exceededCheck API quota at https://mineru.net; wait or upgrade plan

Troubleshooting

  • "no API token found": Run mineru auth or set MINERU_TOKEN env variable
  • Timeout on large files: Increase with --timeout 600 (seconds)
  • Batch fails partially: Check stderr for per-file status; succeeded files are still saved
  • Binary format to stdout: Use -o flag; docx cannot stream to stdout
  • Private deployment: Use --base-url https://your-server.com/api
  • Extraction quality is poor: Try --model vlm for complex layouts, or --ocr for scanned documents
  • Formula not recognized: Ensure --no-formula is NOT set; try --model vlm for better formula support

Notes

  • All status/progress messages go to stderr; only document content goes to stdout
  • Batch mode automatically polls the API with exponential backoff
  • Token is stored in ~/.mineru/config.yaml after mineru auth
  • The CLI wraps the MinerU Open SDK (github.com/OpenDataLab/mineru-open-sdk)

Files

3 total
Select a file
Select a file to preview.

Comments

Loading comments…