Extract Tables From Pdf

v0.4.0

Extract tables from PDF documents using MinerU's table detection engine. Identifies and extracts structured table data from both native and scanned PDFs. Fea...

0· 259·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for mzlzyca/extract-tables-from-pdf.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Extract Tables From Pdf" (mzlzyca/extract-tables-from-pdf) from ClawHub.
Skill page: https://clawhub.ai/mzlzyca/extract-tables-from-pdf
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required env vars: MINERU_TOKEN
Required binaries: mineru-open-api
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install extract-tables-from-pdf

ClawHub CLI

Package manager switcher

npx clawhub@latest install extract-tables-from-pdf
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match what is required: the skill requires the mineru-open-api binary and a MINERU_TOKEN, which are exactly what a MinerU-based PDF table extractor would need.
Instruction Scope
SKILL.md instructs the agent to run mineru-open-api commands, authenticate with MINERU_TOKEN, and operate on local files or URLs. This is within scope, but the instructions imply the CLI will use the token to contact MinerU services — meaning PDF contents may be transmitted to an external service; that is expected but important for privacy.
Install Mechanism
Installers are standard: npm package and go install from a GitHub repo. Both are reasonable for a CLI tool. Note: installing npm packages can run lifecycle scripts, so review the package or install in a controlled environment if you need extra safety.
Credentials
Only a single service credential (MINERU_TOKEN) is required and is declared as the primary credential. That is proportional to the described remote-API usage. The skill does not request unrelated credentials or system paths.
Persistence & Privilege
Skill does not request always:true or elevated platform persistence. It is user-invocable and can run autonomously (platform default), which is normal for skills of this type.
Assessment
This skill appears to be what it says: a wrapper around the mineru-open-api CLI that requires a MINERU_TOKEN. Before installing or using it: 1) Confirm mineru.net and the GitHub repo look legitimate and review their privacy/security docs; 2) Assume PDFs you process may be uploaded to MinerU servers—do not send sensitive or regulated data unless you trust the service or have an on-prem/self-hosted alternative; 3) Prefer installing in an isolated environment (container or VM) and inspect the npm/go package source if you require higher assurance; 4) Limit and rotate the MINERU_TOKEN and avoid storing it in shared shells; 5) If you need purely local-only processing, verify the CLI actually supports local-only mode or find an offline tool. If you want me to, I can fetch the mineru-open-api npm package or GitHub repo and highlight any concerning code or publish scripts before you install.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

📄 Clawdis
Binsmineru-open-api
EnvMINERU_TOKEN
Primary envMINERU_TOKEN

Install

Install via npm
Bins: mineru-open-api
npm i -g mineru-open-api
Install via go install
Bins: mineru-open-api
latestvk975269p9rrshfab8chr4sy201844jjj
259downloads
0stars
6versions
Updated 3w ago
v0.4.0
MIT-0

Extract Tables From Pdf

Convert and extract content from .pdf using MinerU (mineru-open-api).

Install

npm install -g mineru-open-api
# or via Go (macOS/Linux):
go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

Quick Start

# Extract tables from PDF (requires token)
mineru-open-api extract report.pdf -o ./out/

# With explicit table flag and OCR for scanned docs
mineru-open-api extract scanned.pdf --ocr --table -o ./out/

Authentication

Token required for extract and crawl:

mineru-open-api auth            # Interactive token setup
export MINERU_TOKEN="your-token" # Or via environment variable

Create token at: https://mineru.net/apiManage/token

Capabilities

  • Supports local files and URLs
  • Requires token (mineru-open-api auth or MINERU_TOKEN env)
  • Supported input: .pdf
  • Language hint with --language (default: ch, use en for English)
  • Page range with --pages (where applicable)

Notes

  • Table recognition requires extract with token. flash-extract does NOT support tables. Use --table flag (enabled by default).
  • Output goes to stdout by default; use -o <dir> to save to file
  • Binary formats (docx) require -o flag (cannot stream to stdout)
  • All progress/status messages go to stderr
  • MinerU is an open-source project by OpenDataLab (Shanghai AI Lab): https://github.com/opendatalab/MinerU

Comments

Loading comments...