Finance OCR Pro
v1.0.4Use this skill when the user asks to OCR, transcribe, extract, or convert the contents of a scanned PDF, image, or office document into Markdown, HTML, DOCX,...
⭐ 0· 86·0 current·0 all-time
by@rizmoon
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
Name/description, SKILL.md, README and the included scripts all describe an OCR pipeline that renders pages to images and sends them to a vision-capable model at BASE_URL; the requested env vars (API_KEY, BASE_URL, VLM_MODEL) are appropriate and necessary for that purpose. The bundled scripts (document→image, VLM client, markdown→outputs, job runtime) align with the stated functionality.
Instruction Scope
SKILL.md explicitly instructs the agent to transmit rendered page images and OCR prompts to BASE_URL and to run local Python scripts (ocrctl.py, ocr_main.py, etc.). It requires a pre-run notice but also says to 'Proceed automatically unless the user asks to change those defaults' once OCR intent is given — this grants the agent limited autonomy to run the pipeline without additional confirmation. The instructions do not attempt to read or transmit unrelated system secrets or files, but they do direct network transmission of potentially sensitive images to BASE_URL (documented).
Install Mechanism
No install spec is provided (instruction-only install), and the skill expects the included Python scripts to be run in a local interpreter; requirements.txt lists standard libraries for OCR, image and document conversion. The md_to_html script may embed or download MathJax/Mermaid (and fallback to CDN if local caching fails), which is an expected behaviour for report generation but does involve additional network requests.
Credentials
The three required env vars (API_KEY, BASE_URL, VLM_MODEL) are directly relevant to sending images and prompts to a VLM and are proportionate. Minor manifest inconsistency: the registry metadata above lists 'Primary credential: none' while agents/openai.yaml sets primary_credential: API_KEY and README/skill.yaml mention API_KEY as the primary secret. This is likely a metadata mismatch but not a functional security issue.
Persistence & Privilege
The skill does not request 'always: true'. It runs local background jobs and persists per-job state under ~/.semantic-ocr/jobs/<job_id>/ — this is reasonable for long-running OCR tasks. It does not appear to modify other skills or system-wide agent configuration.
Assessment
This skill appears to do what it says: render pages locally, send page images and prompts to a configured vision-model endpoint, and produce Markdown/HTML/DOCX/Excel outputs. Before installing or running it: (1) Verify BASE_URL points to a trusted endpoint (use a local VLM for sensitive documents); any page images will be transmitted there. (2) Keep API_KEY secret and do not commit a populated .env. (3) Be aware the job runtime writes state and outputs to ~/.semantic-ocr/jobs/, so consider disk location and quotas. (4) The HTML report generator may fetch MathJax/Mermaid from CDNs if local caching fails — if you require fully air-gapped behavior, host or pre-cache those assets locally. (5) There is a small metadata inconsistency about the declared primary credential (registry metadata vs agents/openai.yaml); you can inspect skill.yaml to confirm permissions (network/filesystem/shell) before use. If you need stronger guarantees for confidentiality, run the pipeline against a local VLM or on an isolated machine.Like a lobster shell, security has layers — review code before you run it.
latestvk97d2s69dz590vc1frkcxxq6gs83y057
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
Runtime requirements
EnvAPI_KEY, BASE_URL, VLM_MODEL
