Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

OpenDataLoader PDF Parser (乌贼版)

v1.0.0

PDF parsing tool for AI/RAG. Convert PDF to Markdown, JSON, HTML with layout preservation, bounding boxes, and image extraction. Use when you need to extract...

0· 96·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for wtjjacobj/opendataloader-pdf-wuxie.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "OpenDataLoader PDF Parser (乌贼版)" (wtjjacobj/opendataloader-pdf-wuxie) from ClawHub.
Skill page: https://clawhub.ai/wtjjacobj/opendataloader-pdf-wuxie
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install opendataloader-pdf-wuxie

ClawHub CLI

Package manager switcher

npx clawhub@latest install opendataloader-pdf-wuxie
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description, CLI examples, and included test script all align with a PDF parsing CLI. The SKILL.md states the package is installed via pipx and bundles a PDFBox JAR — that is coherent for a Java-based PDF tool. Minor inconsistency: registry metadata lists no homepage/source while SKILL.md includes a GitHub homepage URL, which should be verified.
Instruction Scope
Runtime instructions only run a local CLI (opendataloader-pdf) against local PDF files and write output to local directories. They do not instruct reading unrelated system files or environment variables. Ambiguity: the '--hybrid' / 'Hybrid AI mode: docling-fast' option could imply contacting an external AI service or model; SKILL.md gives no details about network calls, remote endpoints, or required API keys.
Install Mechanism
No install spec in the registry; SKILL.md recommends 'pipx install opendataloader-pdf'. Installing from PyPI/pipx runs package install scripts which may execute code at install time and will pull the package from wherever it's published. The SKILL.md claims a GitHub homepage, but the registry shows source unknown — confirm the exact pip package name and origin before installing.
Credentials
The skill declares no required environment variables, no secrets, and no config paths. That is proportionate for a local PDF parsing CLI. However, the hybrid/AI option appears under-specified: if it uses a remote service it would typically require API credentials (none are declared), so verify whether additional credentials are needed at runtime.
Persistence & Privilege
The skill does not request persistent/autostart privileges (always:false). It is user-invocable and allows autonomous invocation (default) which is normal. It does not declare modifications to other skills or system-wide settings.
What to consider before installing
This skill appears to do what it says (convert PDFs locally) but there are a few red flags to check before installing: 1) Verify the pip package source — confirm the exact PyPI package and/or GitHub repo matches the SKILL.md homepage and is from a trusted maintainer. 2) Inspect the package contents (and bundled JAR) before installing, or install into an isolated environment/container to observe behavior. 3) Ask the maintainer or check docs what '--hybrid' (docling-fast) does and whether it calls remote services or requires API keys — if it does, confirm what endpoints and credentials are used. 4) Because pipx runs code at install time, avoid installing on sensitive/production hosts until you've validated the package. If you want, provide the actual pip package name or the source repo and I can check it for further inconsistencies.

Like a lobster shell, security has layers — review code before you run it.

latestvk97b26en7qdzq9d9nxbzeckgpn83sj0g
96downloads
0stars
1versions
Updated 1mo ago
v1.0.0
MIT-0

opendataloader-pdf Skill

PDF parsing tool for AI/RAG scenarios. Converts PDF to Markdown, JSON, HTML with layout preservation.

Installation

pipx install opendataloader-pdf

Requires Java runtime (bundled JAR is included).

Quick Usage

# PDF to Markdown (most common)
opendataloader-pdf input.pdf -o output_dir -f markdown

# PDF to JSON (with bounding boxes)
opendataloader-pdf input.pdf -o output_dir -f json

# Multiple formats at once
opendataloader-pdf input.pdf -o output_dir -f json,markdown,html

# Extract specific pages
opendataloader-pdf input.pdf -o output_dir -f markdown --pages "1,3,5-10"

# Extract images
opendataloader-pdf input.pdf -o output_dir -f markdown --image-dir images/

# Use PDF structure tree (for tagged PDFs)
opendataloader-pdf input.pdf -o output_dir -f markdown --use-struct-tree

# Output to stdout
opendataloader-pdf input.pdf -f markdown --to-stdout

Output Formats

FormatDescription
jsonStructured JSON with bounding boxes, fonts, reading order
markdownMarkdown text with images as references
htmlHTML with styling
textPlain text
pdfRebuilt PDF
markdown-with-htmlMarkdown with HTML for complex elements
markdown-with-imagesMarkdown with embedded base64 images

Key Options

OptionDescription
--pagesPage range, e.g., "1,3,5-10"
--image-dirDirectory for extracted images
--use-struct-treeUse PDF structure tree for reading order
--table-methodTable detection: default (border-based) or cluster
--reading-orderAlgorithm: off or xycut (default)
--hybridHybrid AI mode: docling-fast for complex tables
--sanitizeRemove sensitive data (emails, phones, etc.)
--include-header-footerInclude page headers/footers

Examples

Basic Conversion

# Convert to markdown
opendataloader-pdf document.pdf -o ./output -f markdown

# Convert to JSON with structure
opendataloader-pdf document.pdf -o ./output -f json --use-struct-tree

Batch Processing

# Multiple files
opendataloader-pdf "file1.pdf" "file2.pdf" "folder/" -o output/

# All PDFs in directory
opendataloader-pdf ./pdfs/ -o ./output/ -f markdown

Advanced Options

# Use AI hybrid mode for complex tables
opendataloader-pdf input.pdf -o output/ -f markdown --hybrid docling-fast

# Extract only pages 1-5
opendataloader-pdf input.pdf -o output/ -f markdown --pages "1-5"

# Sanitize sensitive data
opendataloader-pdf input.pdf -o output/ -f json --sanitize

Performance Notes

  • Each convert() call spawns a JVM process
  • For batch processing, pass multiple files in one call
  • ~6 seconds for typical 300-page PDF
  • Images extracted to {output_name}_images/ directory

Troubleshooting

Java not found

Ensure Java runtime is installed. The tool bundles its own PDFBox JAR.

Font warnings

Warnings about missing fonts are normal and don't affect output quality.

Slow performance

Use batch mode (multiple files in one call) instead of calling repeatedly.

Comments

Loading comments...