Install
openclaw skills install finance-ocr-proUse this skill when the user asks to OCR, transcribe, extract, or convert the contents of a scanned PDF, image, or office document into Markdown, HTML, DOCX, or Excel. This workflow sends page images and OCR prompts to a configured OpenAI-compatible VLM endpoint, requires API_KEY, BASE_URL, and VLM_MODEL, bundles HTML report assets locally so no runtime CDN downloads are needed, and produces improved native Word equation output for recognized LaTeX formulas. It is especially valuable for financial documents and other visually complex materials with dense tables, charts, graphs, and multi-part layouts. Prefer durable background jobs for long-running OCR work.
openclaw skills install finance-ocr-proVersion 1.0.7 update: Word/DOCX output now includes optimized and expanded LaTeX formula restoration for recognized formulas.
Run this skill only after OCR intent from the user.
This skill is especially helpful for financial reports, annual reports, prospectuses, investor presentations, regulatory filings, research reports, and other documents with complicated structure, charts, graphs, tables, and mixed layout elements.
Before running OCR, make the operating model clear:
API_KEY (sensitive) -- the API key for authenticating with the VLM endpoint.BASE_URL -- the base URL of the OpenAI-compatible VLM endpoint. All page images and OCR prompts are transmitted to this URL.VLM_MODEL -- the vision-capable model identifier. Must support image inputs; text-only models will not work.BASE_URL. This is the primary data-transmission path. Users must verify that the endpoint is trusted before processing sensitive documents.BASE_URL must point to a local VLM service. Do not run this skill against an external endpoint with sensitive documents unless the provider is trusted.scripts/ocr_setup.py checks dependencies and creates .env templates, but it never installs Python packages automatically. Users must review and run dependency installation themselves.scripts/vendor/ and does not download frontend assets from a CDN at runtime.osascript. Commands are executed with explicit argument lists rather than shell strings..env file. Use .env.example as a template and keep real credentials local.After the user asks for OCR or extraction, give a short notice that includes:
BASE_URL is local or remoteVLM_MODEL will be usedProceed automatically unless the user asks to change those defaults.
VLM_MODEL1
If the user's API endpoint or plan supports safe parallel OCR requests, tell them they can choose a higher thread count.~/.semantic-ocr/jobs/<job_id>/results/ocr_output/OCR_<filename>/results/Use the skill-local virtual environment if present.
.venv/bin/python.venv/Scripts/python.exepython3 on macOS/Linux, python on WindowsBefore running any command, resolve the interpreter and reuse it for the rest of the session:
PYTHON="${PYTHON:-$( [ -x .venv/bin/python ] && printf .venv/bin/python || printf python3 )}".venv\Scripts\python.exe when present, otherwise pythonRun:
$PYTHON scripts/ocr_setup.py --check
If setup is incomplete, run:
$PYTHON scripts/ocr_setup.py
By default, start a background worker:
$PYTHON scripts/ocrctl.py --json start /path/to/document.pdf
If the provider supports concurrency and the user wants faster OCR, offer a higher thread count such as:
$PYTHON scripts/ocrctl.py --json start -t 4 /path/to/document.pdf
Then inspect progress and outputs:
$PYTHON scripts/ocrctl.py --json status <job_id>
$PYTHON scripts/ocrctl.py --json artifacts <job_id>
$PYTHON scripts/ocrctl.py --json tail <job_id>
Use synchronous mode only when the user explicitly wants inline execution:
$PYTHON scripts/ocr_main.py /path/to/document.pdf
API_KEY, BASE_URL, and VLM_MODEL to be configured before running.200 DPI.1 unless the user's API endpoint, rate limits, and plan support concurrent OCR requests.