Chinese Handwriting Ocr
PassAudited by ClawScan on May 11, 2026.
Overview
This appears to be a local OCR toolkit with no evidence of data exfiltration, but users should review its external dependencies, documentation mismatches, local OCR outputs, and a broad manual cleanup command.
Use this in a controlled Python environment, verify the actual script options before running, and be careful with generated OCR PDFs/text files because they may contain sensitive extracted content. Do not run the broad Python process cleanup command unless you have confirmed which processes it will stop.
Findings (4)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Some documented commands may fail or not use the engine the user expects.
The main documentation advertises engine options that are not implemented by the provided ocr_date_extractor.py parser, which only accepts input and --json.
python scripts/ocr_date_extractor.py 文档.pdf --engine rapid ... --engine paddle ... --engine both
Verify each script's actual --help output before relying on the documented engine options.
Installing the skill's dependencies may download third-party code and models into the local environment.
The skill depends on external, unpinned Python packages and OCR model downloads, which is expected for this OCR purpose but not captured by an install spec.
pip install rapidocr-onnxruntime ... pip install paddleocr paddlepaddle ... 首次启动 PaddleOCR 需下载模型(~18MB)
Install in a virtual environment, pin versions if reproducibility matters, and review package sources before use.
If run as-is, it could terminate unrelated Python work on the machine.
The documented manual cleanup command force-stops matching Python processes and is broader than this skill's own scripts.
Get-Process python* | Where-Object {$_.CPU -gt 10} | Stop-Process -ForceBefore using the cleanup command, inspect the process list and stop only confirmed OCR-related processes.
Generated PDFs or text outputs may contain extracted signatures, dates, IDs, or other sensitive document text that could be exposed if shared.
The script embeds extracted OCR text into output PDF annotations, which is purpose-aligned but persists recognized document contents.
page.add_annot(... content=f"RapidOCR: {text}")Treat OCR outputs like the original sensitive documents and inspect/redact annotations before sharing.
