Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Pdf Vision

v1.0.0

Extract text content from image-based/scanned PDFs using multiple vision APIs with automatic fallback. Supports Xflow (qwen3-vl-plus) and ZhipuAI (GLM-4.6V-F...

0· 72·1 current·1 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lpq6/pdf-vision.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Pdf Vision" (lpq6/pdf-vision) from ClawHub.
Skill page: https://clawhub.ai/lpq6/pdf-vision
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install pdf-vision

ClawHub CLI

Package manager switcher

npx clawhub@latest install pdf-vision
Security Scan
VirusTotalVirusTotal
Pending
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
The core files and SKILL.md align with the stated purpose: converting PDF pages to images and calling vision-capable models (Xflow / ZhipuAI). However, the repository also includes scripts unrelated to PDF extraction (scripts/create_github_repo.py) which try to locate/use a GITHUB_TOKEN (including by parsing ~/.bashrc) and instruct the user about pushing code to GitHub. That GitHub-oriented functionality is not described in the skill metadata or SKILL.md and is unnecessary for PDF extraction.
!
Instruction Scope
SKILL.md and the main scripts only instruct reading your OpenClaw config (~/.openclaw/openclaw.json), converting PDFs to images (pypdfium2), and calling configured model endpoints—this is appropriate. But create_github_repo.py attempts to read environment variables and fallback to parsing ~/.bashrc to find a GITHUB_TOKEN, which is outside the documented scope. test_skill.sh also references a user-specific file path (/home/lpq/.openclaw/workspace/林佩权课表.pdf), indicating leftover developer-specific test artifacts. These extras expand the runtime surface beyond what the skill promises.
Install Mechanism
There is no install specification (instruction-only skill) and no remote download or archive extraction. The package contains local Python and shell scripts only. No network-installed code is fetched at install time by the skill itself.
!
Credentials
The skill legitimately reads API keys from ~/.openclaw/openclaw.json for the vision providers, which is proportional. However, create_github_repo.py looks for GITHUB_TOKEN (env or inside ~/.bashrc) without that credential being declared or documented in SKILL.md. That is unexpected for an OCR skill and could lead to accidental use of a shell-stored token if the helper script is executed.
Persistence & Privilege
The skill does not request always:true and does not modify other skills or system-wide settings. The scripts create temporary files under /tmp (documented) and otherwise run on demand. There is no autonomous persistence or privilege escalation requested.
What to consider before installing
This skill's main OCR functionality appears coherent and reasonable: it converts PDF pages to images, reads your OpenClaw config for provider baseUrls/apiKeys, and posts image data to those endpoints. However, two red flags should be addressed before installing or running it: 1) Unrelated GitHub helper script: scripts/create_github_repo.py is unrelated to PDF extraction and will attempt to find and use a GITHUB_TOKEN (from the environment or by parsing ~/.bashrc) to call the GitHub API. Only run that script if you understand and trust it; otherwise remove or ignore it. Storing tokens in shell RC files is risky—prefer a dedicated credential store. 2) Residual test artifacts: test_skill.sh contains a hardcoded user-specific PDF path (author-local); review and update or delete it to avoid accidental execution that references your filesystem. Recommended actions before installation: - Inspect and (if not needed) delete or move scripts/create_github_repo.py from the skill directory. - Search the skill for any other helper scripts that access credentials or user files and remove or sandbox them. - Run the main extraction script in a sandboxed environment (isolated user account or container) first and verify it only reads ~/.openclaw/openclaw.json and /tmp files. - Ensure your OpenClaw config stores only the credentials you intend to use and is not world-readable. If the repository owner clarifies that the GitHub script is intentionally included (e.g., convenience for packaging) and documents it in SKILL.md, and if you plan to use it only in a controlled way, this lowers the concern. If you cannot get that clarification, treat the extra scripts as suspicious and remove them before use.

Like a lobster shell, security has layers — review code before you run it.

latestvk978thg4zhg88achbxdrxxedzn841w3c
72downloads
0stars
1versions
Updated 3w ago
v1.0.0
MIT-0

PDF Vision Extraction Skill (Enhanced)

Overview

This skill handles image-based or scanned PDFs that contain no selectable text. It supports multiple vision APIs with automatic fallback:

Primary Models

  • Xflow: qwen3-vl-plus (your primary vision model)
  • ZhipuAI: glm-4.6v-flash (free vision model with fallback support)
  • Fallback: glm-5 (text-only, but may work with some image prompts)

Unlike traditional PDF text extraction tools (pdftotext, pdfplumber) which only work on text-based PDFs, this skill can process:

  • Scanned documents
  • Image-only PDFs
  • Photographed documents
  • Handwritten notes (with limitations)
  • Complex layouts with tables and formatting

Supported Models

Vision-Capable Models

ProviderModelTypeContextFree
Xflowqwen3-vl-plusVision + Text131K
ZhipuAIglm-4.6v-flashVision + Text32K
ZhipuAIglm-5Text-only*128K

Additional Text Models (for fallback)

ProviderModelContextFree
ZhipuAIglm-4-flash-250414128K
ZhipuAIcogview-3-flash32K

*Note: glm-5 is primarily text-only but may handle image prompts in some cases.

Prerequisites

1. API Configuration

Your OpenClaw must be configured with both providers:

Xflow Configuration (already set up):

  • models.providers.openai.baseUrl: https://apis.iflow.cn/v1
  • models.providers.openai.apiKey: Your Xflow API key

ZhipuAI Configuration (update token):

  • models.providers.zhipuai.baseUrl: https://open.bigmodel.cn/api/paas/v4
  • models.providers.zhipuai.apiKey: Your ZhipuAI API token

2. Required System Tools

  • pypdfium2 Python library (for PDF to image conversion)
  • curl (for API calls)
  • base64 (for image encoding)

3. Python Libraries (already installed)

pypdfium2

Usage

Automatic Fallback Mode (Default)

Uses Xflow first, falls back to ZhipuAI if needed:

./scripts/pdf_vision.py --pdf-path /path/to/document.pdf

Specific Model Selection

Force a specific model for cost or performance reasons:

# Use free GLM-4.6V-Flash model
./scripts/pdf_vision.py --pdf-path document.pdf --model zhipuai/glm-4.6v-flash

# Use specific Xflow model  
./scripts/pdf_vision.py --pdf-path document.pdf --model openai/qwen3-vl-plus

# Short form (auto-detects provider)
./scripts/pdf_vision.py --pdf-path document.pdf --model glm-4.6v-flash

Structured Data Extraction

./scripts/pdf_vision.py --pdf-path invoice.pdf --prompt "Extract as JSON: vendor, date, total" --model glm-4.6v-flash

Multi-page PDF Handling

# Process page 3 specifically
./scripts/pdf_vision.py --pdf-path book.pdf --page 3 --output page3.txt

Configuration

Environment Variables

The skill reads configuration from your OpenClaw config file (~/.openclaw/openclaw.json):

  • models.providers.openai.baseUrl & apiKey
  • models.providers.zhipuai.baseUrl & apiKey

Output Format

Returns extracted text content as a string. For structured data requests, the AI model will format output according to your prompt instructions.

Examples

Cost-Optimized Extraction (Free Model)

Command: --model glm-4.6v-flash Use case: When you want to use free vision capabilities Result: Good quality extraction at no cost

High-Quality Extraction (Premium Model)

Command: --model qwen3-vl-plus Use case: When you need maximum accuracy and complex layout understanding Result: Best possible extraction quality

Automatic Fallback (Recommended)

Command: No --model flag Use case: Production environments where reliability is key Result: Uses best available model, falls back gracefully

Model Comparison

GLM-4.6V-Flash (Free)

  • ✅ Completely free
  • ✅ Good Chinese text recognition
  • ✅ Decent table structure preservation
  • ⚠️ Lower context window (32K vs 131K)
  • ⚠️ May struggle with very complex layouts

Qwen3-VL-Plus (Premium)

  • ✅ Superior image understanding
  • ✅ Excellent table and structure recognition
  • ✅ Larger context window (131K)
  • ✅ Better handling of mixed languages
  • ❌ Requires paid API access

Limitations

  • Single page processing: Currently processes one page at a time
  • Image quality: Better results with higher resolution scans
  • Complex layouts: May struggle with very dense or overlapping text
  • Handwriting: Limited accuracy with handwritten content
  • File size: Large PDFs may exceed API token limits

Technical Implementation

The skill follows this workflow:

  1. PDF to Image: Converts specified PDF page to PNG using pypdfium2
  2. Model Selection: Chooses model based on user preference or fallback logic
  3. API Call: Sends image + prompt to selected vision API endpoint
  4. Response Parsing: Extracts and returns the AI-generated text content
  5. Fallback: If primary model fails, tries alternative models

For debugging, temporary files are created in /tmp/:

  • /tmp/pdf_vision_page.png - converted image
  • /tmp/pdf_vision_payload_*.json - API request payload
  • /tmp/pdf_vision_response_*.json - API response

Integration Notes

This skill complements the standard pdf skill:

  • Use pdf skill for text-based PDFs (faster, no API cost)
  • Use pdf-vision skill for image-based/scanned PDFs (requires vision API)

Both skills can be used together in a fallback pattern:

  1. Try pdf skill first
  2. If no text extracted, fall back to pdf-vision skill

Cost Optimization Tips

  1. Use GLM-4.6V-Flash for routine tasks - it's free and quite capable
  2. Reserve Qwen3-VL-Plus for complex documents - when you need maximum accuracy
  3. Test both models on your document types - choose based on your quality requirements
  4. Monitor API usage - track which models you're using most

Update Your GLM API Token

Replace the placeholder token in your config:

# Replace YOUR_ACTUAL_GLM_TOKEN with your real token
sed -i 's/YOUR_GLM_API_TOKEN_HERE/YOUR_ACTUAL_GLM_TOKEN/g' ~/.openclaw/openclaw.json

Comments

Loading comments...