pdf-ocr-layout

PassAudited by ClawScan on May 1, 2026.

Overview

This appears to be a legitimate OCR/document-analysis skill, but it uploads selected documents/images to Zhipu’s cloud models and saves extracted content locally.

Before installing, confirm you are comfortable sending the chosen PDFs/images to Zhipu's services and storing extracted text/images in the output folder. Use a dedicated Zhipu API key if possible, verify the Python dependency name, and review generated analyses rather than treating them as authoritative.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

Note

ASI07: Insecure Inter-Agent Communication

What this means

Private or confidential documents processed by this skill may be transmitted to Zhipu's cloud service for OCR and analysis.

Why it was flagged

The selected input file is base64-encoded and sent to the Zhipu OCR API, which is central to the skill's purpose but means document contents leave the local environment.

Skill content

response = client.layout_parsing.create(
        model="glm-ocr",
        file=image_to_base64(file_path)
    )

Recommendation

Use only with documents you are allowed to send to Zhipu, and review the provider's data retention and privacy terms.

Note

ASI03: Identity and Privilege Abuse

What this means

The configured API key may incur usage charges and grants access to the Zhipu account associated with it.

Why it was flagged

The skill uses a Zhipu API key from the environment to call external models. This is expected for the stated integration, but it is a credential users should manage carefully.

Skill content

client = ZhipuAiClient(api_key=os.getenv("ZHIPU_API_KEY"))

Recommendation

Use a dedicated or least-privilege API key where possible, monitor usage, and consider updating the registry metadata to declare the required environment variable.

Note

ASI06: Memory and Context Poisoning

What this means

Extracted document text and cropped images can remain on disk after processing.

Why it was flagged

The script stores the source path, full OCR markdown context, and extracted elements in a JSON file under the output directory.

Skill content

payload = {
        "source_file": str(file_path),
        "full_markdown_context": full_context_md,
        "elements": extracted_elements
    }

Recommendation

Choose an appropriate output directory and delete generated JSON/images if the source document is sensitive.

Note

ASI01: Agent Goal Hijack

What this means

A maliciously crafted document could make the generated interpretation misleading or off-task.

Why it was flagged

OCR-extracted document text and table content are inserted directly into model prompts for analysis. This is purpose-aligned, but hostile text inside a document could try to steer the model's response.

Skill content

{full_context[:6000]}
...
{markdown_content}

Recommendation

Treat model-generated analysis as advisory, review results for unusual instructions or claims, and consider adding prompt guidance that document text is untrusted content.