Skillv1.0.0

ClawScan security

Paper To Table · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

BenignApr 30, 2026, 1:56 AM

Verdict: benign
Confidence: high
Model: gpt-5-mini
Summary: The skill's code and runtime instructions are internally consistent with its stated purpose (extracting papers into table rows); it requests no credentials and has no network endpoints, but it omits dependency declarations and requires caution when sending paper text to external LLMs or using OCR system binaries.
Guidance: This skill appears to do what it says: parse papers and prepare LLM extraction inputs, then validate and append rows to XLSX/CSV. Before installing or running: 1) Ensure the runtime has the Python packages used (pdfplumber, PyMuPDF/fitz, pdf2image, pytesseract, python-docx, openpyxl) and system OCR/poppler binaries if you need OCR; the skill does not declare these dependencies. 2) Be aware that the workflow sends paper text to an LLM for extraction (SKILL.md's 'LLM Extraction' step) — do not run on confidential or unpublished manuscripts unless you trust the model provider or run a local/private model. 3) Test on a small non-sensitive dataset to confirm encoding/locale behavior (Chinese/English) and duplicate-detection thresholds. 4) If you need stricter privacy, modify the workflow to use a local LLM or remove the LLM extraction step and perform manual review. Overall the components and instructions are coherent, but verify dependencies and data-handling expectations before use.

Review Dimensions

Purpose & Capability: okName/description (extract papers → table) match the included scripts (read_table, extract_paper, write_table, batch_process) and the reference docs. The required functionality (PDF/DOCX/TXT parsing, header inference, LLM-based field extraction, writing XLSX/CSV) is implemented and expected for this purpose.
Instruction Scope: noteSKILL.md explicitly instructs an LLM to extract fields from paper text and return JSON; the code prepares and validates that input. This is within scope, but the SKILL.md implies sending paper contents (potentially full text) to an LLM — a privacy/data-leakage consideration. The instructions also rely on OCR fallbacks and external Python packages but do not enumerate system-level binaries (e.g., Tesseract, poppler) needed for OCR.
Install Mechanism: okNo install spec (instruction-only) — lowers installation risk. However, the code imports many third-party Python packages (pdfplumber, PyMuPDF/fitz, pdf2image, pytesseract, python-docx, openpyxl). Those dependencies are not declared in metadata; the skill will fail or fall back with error messages if libraries/binaries are missing.
Credentials: okThe skill requests no environment variables, credentials, or external config paths. All file I/O is limited to input papers, the target table, and logs in an output folder. No access to unrelated services or secrets is requested.
Persistence & Privilege: okFlags show no special persistence or forced inclusion (always: false). The skill writes logs and output files under output directories it creates — expected for batch processing. It does not modify other skills or system-wide agent settings.