Azure Document OCR
PassAudited by ClawScan on May 1, 2026.
Overview
This is a coherent Azure OCR helper, but it uploads selected documents to Azure and uses an Azure API key, so sensitive files and credentials need care.
Before installing or using it, confirm you are comfortable sending the selected documents to Azure Document Intelligence, keep the Azure key secret, verify the endpoint is your Azure resource, and avoid running batch mode on folders that contain unrelated sensitive files.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Documents submitted for OCR, including sensitive business, identity, tax, or health-related files, may be processed by Azure and the extracted content may be written locally.
The script reads the selected local document and sends its bytes to the configured Azure Document Intelligence endpoint. This is purpose-aligned for OCR, but it means document contents leave the local machine.
with open(file_path, "rb") as f: body = f.read(); response = requests.post(analyze_url, params=params, headers=headers, data=body)
Use this only with documents you are allowed to send to Azure, verify the Azure endpoint, and handle output files as sensitive data.
The Azure key can authorize use of the Document Intelligence resource and may incur charges or expose service access if mishandled.
The script requires an Azure Document Intelligence endpoint and subscription key from environment variables. This is expected for the service, but it is a credential requirement users should notice.
endpoint = os.environ.get("AZURE_DOC_INTEL_ENDPOINT"); key = os.environ.get("AZURE_DOC_INTEL_KEY")Store the key securely, use a dedicated/least-privilege Azure resource when possible, do not commit it to files, and rotate it if it may have been exposed.
A broad batch run could upload many local documents to Azure and create many extracted-output files.
Batch mode finds all matching files in the user-specified directory and processes them concurrently. This is disclosed and purpose-aligned, but the scope can be broad if the wrong folder is chosen.
documents = find_documents(input_path, extensions); ThreadPoolExecutor(max_workers=args.workers)
Point batch mode only at intended folders, narrow extensions when needed, and choose a worker count appropriate for rate limits and sensitivity.
