Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

PDF Extractor

v1.0.0

PDF Field Extractor — AI-powered PDF structured data extraction. Extract key fields from PDF into Excel/JSON. Supports: invoice, contract, receipt, bank stat...

0· 59·0 current·0 all-time
byYK-Global@billjamno58

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for billjamno58/pdf-extractor.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "PDF Extractor" (billjamno58/pdf-extractor) from ClawHub.
Skill page: https://clawhub.ai/billjamno58/pdf-extractor
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install pdf-extractor

ClawHub CLI

Package manager switcher

npx clawhub@latest install pdf-extractor
Security Scan
Capability signals
CryptoCan make purchasesRequires sensitive credentials
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
!
Purpose & Capability
The code implements PDF parsing, OCR, AI-based field extraction and output generation, which aligns with the advertised purpose. However the registry metadata claims no required environment variables or binaries while the SKILL.md and code require/expect billing credentials (FEISHU_USER_ID, SKILL_BILLING_API_KEY, SKILL_BILLING_SKILL_ID) and an AI API key. That mismatch between declared registry requirements and the actual code is inconsistent and unexplained.
Instruction Scope
SKILL.md instructs the agent to call the included scripts; that matches the bundle. The tool sends document text to an OpenAI-compatible API for extraction (expected). It also documents sending FEISHU_USER_ID to skillpay.me for billing (explicit). A notable scope gap: the extraction code will read OPENAI_API_KEY from the environment if no api_key is passed, but OPENAI_API_KEY is not listed in the skill's declared required env variables—so the runtime may access an API key you did not expect to be used. Otherwise the instructions do not tell the agent to scan unrelated system files.
!
Install Mechanism
There is no install spec (instruction-only), but the bundle contains nine Python scripts that require many third-party Python packages (PyMuPDF/fitz, pdfplumber, pytesseract/Pillow, openpyxl, requests, possibly EasyOCR) and an external Tesseract binary for OCR. The absence of an install mechanism or declared dependencies (and lack of a required Tesseract binary entry) is inconsistent and increases the chance the code will fail or behave unexpectedly on execution.
!
Credentials
Billing and runtime code expect several secrets/environment variables: SKILL_BILLING_API_KEY and SKILL_BILLING_SKILL_ID (skillpay.me calls) and FEISHU_USER_ID (user identifier transmitted for billing). The field extractor will also use OPENAI_API_KEY from the environment if api_key is not passed. Registry metadata declared no required env vars; SKILL.md declares billing env vars but does not declare OPENAI_API_KEY. Requesting/using these credentials, especially API keys and a user identifier transmitted to a third-party billing endpoint, should be justified and explicitly declared before installation.
Persistence & Privilege
The skill is not always: true, doesn't request elevated system-wide configuration changes, and does not modify other skills. It writes temporary processing files (SKILL.md says /tmp/pdf-extractor/) and can save outputs to user-specified paths—expected for this functionality. Autonomous invocation is allowed (platform default) but is not combined with other high privileges here.
What to consider before installing
Do not run this on sensitive documents until you accept the following and/or verify them yourself: (1) The bundle will call an OpenAI-compatible API with your API key — the code reads OPENAI_API_KEY from the environment if you don't pass an api_key parameter; consider using a scoped/limited key. (2) The skill transmits FEISHU_USER_ID (an identifier) to https://skillpay.me for billing and uses SKILL_BILLING_API_KEY; verify SkillPay's legitimacy before providing keys. (3) OCR requires the Tesseract binary (pytesseract calls it) plus Python packages (PyMuPDF, pdfplumber, Pillow, openpyxl, requests, etc.); the registry doesn't list these dependencies—install them in a sandbox first. (4) The package metadata claims no env vars but SKILL.md and code do require/expect several; treat that as a red flag. Recommended steps: review and run the code in an isolated environment, confirm skillpay.me billing behavior, provide only limited API keys (or pass api_key explicitly per call), install Tesseract and required Python libs in a test VM, and ensure you are comfortable with sending document text to an external AI service before using with private docs.

Like a lobster shell, security has layers — review code before you run it.

latestvk97eg30ypyn3f446x1qcddf7x985hwmh
59downloads
0stars
1versions
Updated 3d ago
v1.0.0
MIT-0

PDF Field Extractor

AI-powered PDF structured data extraction — convert PDF key fields into Excel/JSON.

End-to-End Flow

User uploads PDF → Document type identification → AI field extraction → Structured output (Excel/JSON)

from scripts.pdf_extractor import extract_pdf_text
from scripts.field_extractor import extract_fields
from scripts.output_generator import generate_excel, generate_json

# Step 1: Extract PDF text (PyMuPDF + pdfplumber)
text, tables, images = extract_pdf_text("invoice.pdf")

# Step 2: AI field extraction (user provides own API Key, OpenAI-compatible)
fields = extract_fields(
    text=text,
    doc_type="invoice",
    api_key="sk-xxx",
    api_base="https://api.openai.com/v1",
    model="gpt-4o",
)

Supported Document Types

TypeDescription
InvoiceVAT invoice, receipt invoice, electronic invoice
ContractContracts, agreements
ReceiptReceipts, tickets
Bank StatementBank reconciliation statements
LicenseBusiness license
ID CardID card, passport
ExpressWaybill, shipping label
GenericUser-defined custom extraction

Detection Modes

ModeDescription
AutoAI automatically identifies document type
ManualUser specifies document type

Tiered Features

FeatureFREEPRO
Monthly pages10Unlimited
Document typesInvoice onlyAll types
Output formatsTextExcel + JSON + Text
OCR languagesEnglishEnglish + Chinese + 9 more
Batch processing1 pageUnlimited
Custom fieldsYes
PriceFree$0.01/call

Technical Implementation

  • PDF parsing: PyMuPDF (fitz) + pdfplumber for text and table extraction
  • OCR: EasyOCR / Tesseract for scanned documents (multi-language support)
  • AI extraction: OpenAI-compatible API, model-agnostic (GPT-4o, DeepSeek, GLM, etc.)
  • Output: Excel (.xlsx) with formatted sheets, JSON with structured hierarchy

Output Format

Excel Output

  • Sheet per document type
  • Header row with field names
  • Data rows with extracted values
  • Color-coded by confidence

JSON Output

{
  "doc_type": "invoice",
  "fields": {
    "invoice_number": "...",
    "date": "...",
    "amount": "...",
    "buyer": "...",
    "seller": "..."
  },
  "confidence": 0.95
}

Security Notes

  • AI API calls: Uses requests.post to OpenAI-compatible endpoints with user-provided API key (not stored)
  • Data storage: Uses /tmp/pdf-extractor/ for temporary processing files (no home directory write)
  • OCR: Local processing via EasyOCR/Tesseract (no external data transmission)
  • Billing data: FEISHU_USER_ID transmitted to skillpay.me/api/v1/billing for per-call charging

Billing

  • Billing via skillpay.me/api/v1/billing/charge
  • User data transmitted to SkillPay for billing identification
  • $0.01 USD per extraction call (PRO tier)

Required Environment Variables

VariableDescription
FEISHU_USER_IDUser open_id for billing
SKILL_BILLING_API_KEYSkillPay Builder API Key
SKILL_BILLING_SKILL_IDSkillPay Skill ID (default: pdf-extractor)

Common Errors

ErrorCauseSolution
NO_TEXT_EXTRACTEDScanned PDF without OCREnable OCR or use digital PDF
UNSUPPORTED_DOC_TYPEDocument type not recognizedSpecify type manually
API_ERRORAI API key invalid or quota exceededCheck API key

Comments

Loading comments...