Azure Document OCR
v1.0.0Extract text and structured data from documents using Azure Document Intelligence (formerly Form Recognizer). Supports OCR for PDFs, images, scanned document...
Security Scan
OpenClaw
Suspicious
high confidencePurpose & Capability
The code and SKILL.md align with the described purpose (submitting docs to Azure Document Intelligence and returning results). However the registry metadata declares no required environment variables or primary credential while the scripts and SKILL.md require AZURE_DOC_INTEL_ENDPOINT and AZURE_DOC_INTEL_KEY. The package also assumes a Python runtime and the 'requests' library but lists no dependencies. This metadata omission is an incoherence.
Instruction Scope
Runtime instructions tell the agent to read two Azure env vars and to call the Azure Document Intelligence REST API; the included scripts only read provided files/URLs and those env vars and poll the Azure operation endpoint. The instructions do not attempt to read unrelated system files or unknown environment variables and do not post data to third-party endpoints other than the configured Azure endpoint. Still, the instructions require credentials that the registry metadata did not declare.
Install Mechanism
There is no install specification despite shipping Python scripts. The package assumes invocation via 'python' and the presence of the 'requests' package; there are no pip/install instructions or declared dependencies. This mismatch may cause runtime failures and is an omission that reduces transparency about what's needed and what will be present on disk.
Credentials
The two environment values used (AZURE_DOC_INTEL_ENDPOINT and AZURE_DOC_INTEL_KEY) are proportional to the skill's purpose (authenticating to Azure). However the registry metadata fails to declare them (primaryEnv is unset). The API key is sensitive — the skill legitimately needs it, but users should be aware the key will be sent in requests to whichever endpoint they configure.
Persistence & Privilege
The skill does not request elevated persistence (always is false) and does not modify other skills or system-wide configuration. It runs as scripts invoked by the user/agent and performs network calls only to the configured Azure endpoint.
What to consider before installing
This skill's code implements Azure Document Intelligence OCR and is not doing obvious malicious actions, but the package metadata is inconsistent and incomplete. Before installing or using it:
- Expect to set two sensitive environment variables: AZURE_DOC_INTEL_ENDPOINT (your Azure resource URL) and AZURE_DOC_INTEL_KEY (your subscription key). The registry did not declare these — verify you set them yourself and never paste your key into untrusted UIs.
- Verify the 'requests' Python package and a compatible Python interpreter are available; the skill provides no install steps for dependencies.
- Confirm the endpoint you configure is your Azure resource (check domain and subscription) because documents and extracted data will be sent to that endpoint. Do not point the endpoint to unknown third-party domains.
- If you need stricter governance, ask the publisher to update the registry metadata to declare required env vars and dependencies, and to add an install spec (e.g., pip requirements). Also consider using a dedicated Azure key with minimal permissions and rotate it after testing.
- If you are processing sensitive documents, ensure sending them to Azure complies with your data handling and privacy policies.Like a lobster shell, security has layers — review code before you run it.
latest
Azure Document Intelligence OCR
Extract text and structured data from documents using Azure Document Intelligence REST API.
Quick Start
1. Environment Setup
Set your Azure Document Intelligence credentials:
export AZURE_DOC_INTEL_ENDPOINT="https://your-resource.cognitiveservices.azure.com"
export AZURE_DOC_INTEL_KEY="your-api-key"
2. Single File OCR
# Basic text extraction from PDF
python scripts/ocr_extract.py document.pdf
# Extract with layout (tables, structure)
python scripts/ocr_extract.py document.pdf --model prebuilt-layout --format markdown
# Process invoice
python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json
# OCR from URL
python scripts/ocr_extract.py --url "https://example.com/document.pdf"
# Save output to file
python scripts/ocr_extract.py document.pdf --output result.txt
# Extract specific pages
python scripts/ocr_extract.py document.pdf --pages 1-3,5
3. Batch Processing
# Process all documents in a folder
python scripts/batch_ocr.py ./documents/
# Custom output directory and format
python scripts/batch_ocr.py ./documents/ --output-dir ./extracted/ --format markdown
# Use layout model with 8 workers
python scripts/batch_ocr.py ./documents/ --model prebuilt-layout --workers 8
# Filter specific extensions
python scripts/batch_ocr.py ./documents/ --ext .pdf,.png
Model Selection Guide
| Document Type | Recommended Model | Use Case |
|---|---|---|
| General text | prebuilt-read | Pure text extraction, any document |
| Structured docs | prebuilt-layout | Tables, forms, paragraphs, figures |
| Invoices | prebuilt-invoice | Vendor info, line items, totals |
| Receipts | prebuilt-receipt | Merchant, items, totals, dates |
| IDs/Passports | prebuilt-idDocument | Identity documents |
| Business cards | prebuilt-businessCard | Contact information |
| W-2 forms | prebuilt-tax.us.w2 | US tax documents |
| Insurance cards | prebuilt-healthInsuranceCard.us | Health insurance info |
See references/models.md for detailed model documentation.
Supported Input Formats
- PDF:
.pdf(including scanned PDFs) - Images:
.png,.jpg,.jpeg,.tiff,.bmp - URLs: Direct links to documents
Output Formats
- text: Plain text concatenation of all extracted content
- markdown: Structured output with headers and tables (best with layout model)
- json: Raw API response with full extraction details
Features
- Handwriting Recognition: Extracts handwritten text alongside printed text
- CJK Support: Full support for Chinese, Japanese, Korean characters
- Table Extraction: Preserves table structure (use layout model)
- Multi-page Processing: Handles documents with multiple pages
- Concurrent Processing: Batch script supports parallel processing
- URL Input: Process documents directly from URLs
Environment Variables
| Variable | Required | Description |
|---|---|---|
AZURE_DOC_INTEL_ENDPOINT | Yes | Azure Document Intelligence endpoint URL |
AZURE_DOC_INTEL_KEY | Yes | API subscription key |
Error Handling
- Invalid credentials: Check endpoint URL and API key
- Unsupported format: Ensure file extension matches supported types
- Timeout: Large documents may need longer processing (max 300s)
- Rate limiting: Reduce concurrent workers for batch processing
Examples
Extract text from scanned PDF
python scripts/ocr_extract.py scanned_contract.pdf --model prebuilt-read
Process invoices with structured output
python scripts/ocr_extract.py invoice.pdf --model prebuilt-invoice --format json --output invoice_data.json
Batch process with layout analysis
python scripts/batch_ocr.py ./reports/ --model prebuilt-layout --format markdown --workers 4
Extract specific pages from large document
python scripts/ocr_extract.py large_doc.pdf --pages 1,3-5,10 --format text
Comments
Loading comments...
