Chinese Handwriting Ocr

PassAudited by ClawScan on May 11, 2026.

Overview

This appears to be a local OCR toolkit with no evidence of data exfiltration, but users should review its external dependencies, documentation mismatches, local OCR outputs, and a broad manual cleanup command.

Use this in a controlled Python environment, verify the actual script options before running, and be careful with generated OCR PDFs/text files because they may contain sensitive extracted content. Do not run the broad Python process cleanup command unless you have confirmed which processes it will stop.

Findings (4)

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

What this means

Some documented commands may fail or not use the engine the user expects.

Why it was flagged

The main documentation advertises engine options that are not implemented by the provided ocr_date_extractor.py parser, which only accepts input and --json.

Skill content
python scripts/ocr_date_extractor.py 文档.pdf --engine rapid ... --engine paddle ... --engine both
Recommendation

Verify each script's actual --help output before relying on the documented engine options.

What this means

Installing the skill's dependencies may download third-party code and models into the local environment.

Why it was flagged

The skill depends on external, unpinned Python packages and OCR model downloads, which is expected for this OCR purpose but not captured by an install spec.

Skill content
pip install rapidocr-onnxruntime ... pip install paddleocr paddlepaddle ... 首次启动 PaddleOCR 需下载模型(~18MB)
Recommendation

Install in a virtual environment, pin versions if reproducibility matters, and review package sources before use.

What this means

If run as-is, it could terminate unrelated Python work on the machine.

Why it was flagged

The documented manual cleanup command force-stops matching Python processes and is broader than this skill's own scripts.

Skill content
Get-Process python* | Where-Object {$_.CPU -gt 10} | Stop-Process -Force
Recommendation

Before using the cleanup command, inspect the process list and stop only confirmed OCR-related processes.

What this means

Generated PDFs or text outputs may contain extracted signatures, dates, IDs, or other sensitive document text that could be exposed if shared.

Why it was flagged

The script embeds extracted OCR text into output PDF annotations, which is purpose-aligned but persists recognized document contents.

Skill content
page.add_annot(... content=f"RapidOCR: {text}")
Recommendation

Treat OCR outputs like the original sensitive documents and inspect/redact annotations before sharing.