OCR with python

PassAudited by ClawScan on May 1, 2026.

Overview

This skill appears to do what it claims—OCR on user-provided PDFs and images—while requiring normal caution around third-party Python installs and temporary files from sensitive documents.

Before installing, use a virtual environment and trusted package source for PaddleOCR/PaddlePaddle, and be careful when OCRing confidential PDFs because temporary image files may be created locally during processing.

Findings (2)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Low

#ASI04: Agentic Supply Chain Vulnerabilities

What this means

Installing unpinned packages means future package changes could affect behavior or security.

Why it was flagged

The skill asks users to install third-party OCR packages without version pinning. This is expected for an OCR skill, but users should be aware of normal package supply-chain risk.

Skill content

pip3 install paddlepaddle paddleocr

Recommendation

Install from trusted package indexes, consider pinning known-good versions, and use a virtual environment.

Low

#ASI06: Memory and Context Poisoning

What this means

Temporary copies of pages from invoices, contracts, or other private documents may briefly exist on disk and could remain if processing fails.

Why it was flagged

For PDFs, the script copies extracted page images to predictable temporary files under /tmp before OCR and later attempts cleanup. This is purpose-aligned but can matter when processing sensitive documents.

Skill content

output_path = f"/tmp/pdf_page{page_num+1}_img{img_index}.{image_ext}" ... f.write(image_bytes)

Recommendation

Use this in a trusted local environment, avoid shared temporary directories for highly sensitive files, and delete leftover /tmp/pdf_page* files if an OCR run errors.