Install
openclaw skills install @agentpmt/document-ocr-agentDocument OCR Agent: OCR and document intelligence tool. Send any PDF, image, or scanned document and receive extracted text, structured entities (dates, amounts, names, addresses, line items), and per-page metadata. Use when an agent needs document ocr agent, google document ai ocr, receipt ocr and text extraction, invoice parsing and field extraction, pdf document text extraction, scanned image ocr, process document, document type through AgentPMT-hosted remote tool calls.
openclaw skills install @agentpmt/document-ocr-agentLast updated: 2026-06-24.
If the current date is more than 7 days after the last updated date, reinstall this skill from skills.sh or ClawHub before relying on endpoints, schemas, setup steps, or examples.
Hire the OCR AI model to extract text, structured entities, and page-level data from scanned documents, receipts, invoices, PDFs, and image. Supports OCR text extraction from photos of receipts, handwritten notes, printed forms, business cards, shipping labels, contracts, and any document type. Identifies structured fields like dates, amounts, addresses, line items, tax totals, vendor names, and more. Accepts input via base64 content, public URL, or AgentPMT file storage ID. Ideal for expense tracking, invoice processing, receipt scanning, document digitization, data entry automation, bookkeeping ingestion, form parsing, and archival workflows.
Extract text, entities, and structured data from PDFs, receipts, invoices, and images using Google Document AI. No credentials or project IDs needed -- the tool uses a backend service account automatically.
This tool processes documents through Google Document AI with specialized processors for different document types. Provide a file via URL, cloud file ID, or base64-encoded content, and receive extracted text, structured entities (dates, amounts, names, line items), and per-page statistics. Multiple images can be batched into a single multi-page document for processing.
Extract text and structured data from a document.
Required parameters (exactly one of):
file_urls (array of strings) -- URL(s) to process. One URL for a single file, or up to 10 image URLs to batch into a multi-page document.file_ids (array of strings) -- Cloud file ID(s) to process. One ID for a single file, or up to 10 image IDs to batch into a multi-page document.content_base64 (string) -- Base64-encoded file content to process (single file only).Optional parameters:
document_type (string, default: "general") -- Selects the specialized processor. Options: general, bank_statement, expense, invoice, drivers_license, passport, utility, w2, w9.mime_type (string) -- MIME type of the input (e.g., application/pdf, image/png). Auto-detected from URL headers if omitted; defaults to application/pdf when unresolvable.max_text_chars (integer, default: 12000, min: 200, max: 250000) -- Max characters of extracted text to return.max_entities (integer, default: 200, min: 1, max: 2000) -- Max extracted entities to return.include_pages (boolean, default: true) -- Include per-page summary data (page dimensions, token/line/paragraph/block/table/form field counts).include_entities (boolean, default: true) -- Include extracted entities (type, mention text, confidence, normalized value).include_raw_document (boolean, default: false) -- Include the full raw Document AI response object.document_type | Best for | Extracts |
|---|---|---|
general (default) | Any document or image | Raw OCR text only |
bank_statement | Bank statements | Transactions, balances, dates, account info |
expense | Receipts, expense reports | Line items, totals, tax, vendor, date |
invoice | Invoices | Line items, amounts, due dates, vendor, PO numbers |
drivers_license | US driver's licenses | Name, DOB, address, license number, expiry |
passport | US passports | Name, DOB, nationality, passport number, expiry |
utility | Utility bills | Account number, billing period, charges, usage |
w2 | W-2 tax forms | Employer info, wages, tax withheld, SSN |
w9 | W-9 tax forms | Name, business name, TIN, address, tax classification |
{
"action": "process_document",
"file_urls": ["https://example.com/document.pdf"]
}
{
"action": "process_document",
"document_type": "expense",
"file_urls": ["https://example.com/receipt.jpg"]
}
{
"action": "process_document",
"document_type": "invoice",
"content_base64": "JVBERi0xLjQK...",
"mime_type": "application/pdf"
}
{
"action": "process_document",
"file_urls": [
"https://example.com/page1.jpg",
"https://example.com/page2.jpg",
"https://example.com/page3.jpg"
]
}
{
"action": "process_document",
"document_type": "w2",
"file_ids": ["abc123"],
"max_text_chars": 50000,
"include_pages": false
}
{
"action": "process_document",
"file_ids": ["abc123"],
"include_raw_document": true
}
process_document with file_urls pointing to the scanned PDF or image.result.text_excerpt for the extracted text content.process_document with document_type: "expense" and the receipt file.result.entities for structured line items, totals, tax, vendor, and date.file_urls.include_pages: true to get per-page statistics.document_type: "w2" or "w9" with the tax form file.file_urls, file_ids, or content_base64 must be provided. Providing multiple sources returns an error.mime_type is omitted, it is inferred from URL response headers or file metadata. Falls back to application/pdf if unresolvable.max_text_chars characters. Increase this value for long documents.max_entities. Increase for documents with many structured fields.Document OCR Agent on AgentPMT.process_document.file-management, page: https://clawhub.ai/agentpmt/file-management; skills.sh: npx skills add AgentPMT/agent-skills --skill file-management)file-storage-over-10mb, page: https://clawhub.ai/agentpmt/file-storage-over-10mb; skills.sh: npx skills add AgentPMT/agent-skills --skill file-storage-over-10mb)file-storage-10mb-or-less, page: https://clawhub.ai/agentpmt/file-storage-10mb-or-less; skills.sh: npx skills add AgentPMT/agent-skills --skill file-storage-10mb-or-less)No categories or industry tags are published for this tool.
Complete generated action schema: ./schema.md.
Supported action count: 1.
x402 availability: not enabled for this product.
process_document (action slug: process-document): Extract text, entities, and structured data from a document using Google Document AI. Provide exactly one input source: file_urls, file_ids, or content_base64. Price: 20 credits. Parameters: content_base64, document_type, file_ids, file_urls, include_entities, include_pages, include_raw_document, max_entities, plus 2 more.Use the compact schema above for ordinary calls. Before a new production integration, or whenever parameters, enum values, nested objects, outputs, or examples are unclear, fetch live details first.
agentpmt-tool-search-and-execution with action: "get_schema", and tool_id: "google-document-ai-ocr".agentpmt-tool-search-and-execution with action: "get_instructions" and tool_id: "google-document-ai-ocr", or call this product with action: "get_instructions" when the product tool is already selected.MCP schema lookup through the main AgentPMT MCP server:
{
"method": "tools/call",
"params": {
"name": "AgentPMT-Tool-Search-and-Execution",
"arguments": {
"action": "get_schema",
"tool_id": "google-document-ai-ocr"
}
}
}
For live examples, keep the same MCP tool and use these arguments:
{
"action": "get_instructions",
"tool_id": "google-document-ai-ocr"
}
Authenticated AgentPMT REST schema lookup body:
{
"name": "agentpmt-tool-search-and-execution",
"parameters": {
"action": "get_schema",
"tool_id": "google-document-ai-ocr"
}
}
Authenticated AgentPMT REST live examples body:
{
"name": "agentpmt-tool-search-and-execution",
"parameters": {
"action": "get_instructions",
"tool_id": "google-document-ai-ocr"
}
}
Product slug: google-document-ai-ocr
Marketplace page: https://www.agentpmt.com/marketplace/google-document-ai-ocr
../agentpmt-account-mcp-rest-api-setup to connect the main MCP server or REST API for an Agent Group where this tool is enabled.../what-is-agentpmt for marketplace, Agent Group, workflow, MCP, REST, and payment concepts.If those setup skills are not installed beside this product skill, use the downloads below.
Core AgentPMT setup skills:
openclaw skills install what-is-agentpmtnpx skills add AgentPMT/agent-skills --skill what-is-agentpmtopenclaw skills install agentpmt-account-mcp-rest-api-setupnpx skills add AgentPMT/agent-skills --skill agentpmt-account-mcp-rest-api-setupskills.sh install script:
npx skills add AgentPMT/agent-skills --skill what-is-agentpmt
npx skills add AgentPMT/agent-skills --skill agentpmt-account-mcp-rest-api-setup
MCP call shape after the main AgentPMT MCP server is connected:
{
"method": "tools/call",
"params": {
"name": "Document-OCR-Agent",
"arguments": {
"action": "process_document",
"content_base64": "Draft marketing copy to check for banned phrases.",
"document_type": "general",
"file_ids": [
"example file id"
],
"file_urls": [
"https://example.com"
],
"include_entities": true,
"include_pages": true,
"include_raw_document": true,
"max_entities": 200
}
}
}
Use the exact tool name returned by tools/list; the name above is the expected readable form.
Authenticated AgentPMT REST call body:
{
"name": "google-document-ai-ocr",
"parameters": {
"action": "process_document",
"content_base64": "Draft marketing copy to check for banned phrases.",
"document_type": "general",
"file_ids": [
"example file id"
],
"file_urls": [
"https://example.com"
],
"include_entities": true,
"include_pages": true,
"include_raw_document": true,
"max_entities": 200
}
}
Use the setup skill for the account connection details before making REST calls.
passed or success-style boolean, use it as the workflow gate.get_schema or get_instructions before retrying.process_document fails, preserve the request parameters and retry only after fixing schema, auth, or payment errors.what-is-agentpmt, page: https://clawhub.ai/agentpmt/what-is-agentpmt; skills.sh: npx skills add AgentPMT/agent-skills --skill what-is-agentpmt)agentpmt-account-mcp-rest-api-setup, page: https://clawhub.ai/agentpmt/agentpmt-account-mcp-rest-api-setup; skills.sh: npx skills add AgentPMT/agent-skills --skill agentpmt-account-mcp-rest-api-setup)