Back to skill
Skillv1.0.0

ClawScan security

Paper To Table · ClawHub's context-aware review of the artifact, metadata, and declared behavior.

Scanner verdict

BenignApr 30, 2026, 1:56 AM
Verdict
benign
Confidence
high
Model
gpt-5-mini
Summary
The skill's code and runtime instructions are internally consistent with its stated purpose (extracting papers into table rows); it requests no credentials and has no network endpoints, but it omits dependency declarations and requires caution when sending paper text to external LLMs or using OCR system binaries.
Guidance
This skill appears to do what it says: parse papers and prepare LLM extraction inputs, then validate and append rows to XLSX/CSV. Before installing or running: 1) Ensure the runtime has the Python packages used (pdfplumber, PyMuPDF/fitz, pdf2image, pytesseract, python-docx, openpyxl) and system OCR/poppler binaries if you need OCR; the skill does not declare these dependencies. 2) Be aware that the workflow sends paper text to an LLM for extraction (SKILL.md's 'LLM Extraction' step) — do not run on confidential or unpublished manuscripts unless you trust the model provider or run a local/private model. 3) Test on a small non-sensitive dataset to confirm encoding/locale behavior (Chinese/English) and duplicate-detection thresholds. 4) If you need stricter privacy, modify the workflow to use a local LLM or remove the LLM extraction step and perform manual review. Overall the components and instructions are coherent, but verify dependencies and data-handling expectations before use.

Review Dimensions

Purpose & Capability
okName/description (extract papers → table) match the included scripts (read_table, extract_paper, write_table, batch_process) and the reference docs. The required functionality (PDF/DOCX/TXT parsing, header inference, LLM-based field extraction, writing XLSX/CSV) is implemented and expected for this purpose.
Instruction Scope
noteSKILL.md explicitly instructs an LLM to extract fields from paper text and return JSON; the code prepares and validates that input. This is within scope, but the SKILL.md implies sending paper contents (potentially full text) to an LLM — a privacy/data-leakage consideration. The instructions also rely on OCR fallbacks and external Python packages but do not enumerate system-level binaries (e.g., Tesseract, poppler) needed for OCR.
Install Mechanism
okNo install spec (instruction-only) — lowers installation risk. However, the code imports many third-party Python packages (pdfplumber, PyMuPDF/fitz, pdf2image, pytesseract, python-docx, openpyxl). Those dependencies are not declared in metadata; the skill will fail or fall back with error messages if libraries/binaries are missing.
Credentials
okThe skill requests no environment variables, credentials, or external config paths. All file I/O is limited to input papers, the target table, and logs in an output folder. No access to unrelated services or secrets is requested.
Persistence & Privilege
okFlags show no special persistence or forced inclusion (always: false). The skill writes logs and output files under output directories it creates — expected for batch processing. It does not modify other skills or system-wide agent settings.