Install
openclaw skills install @sdk-team/alibabacloud-docmind-parseAlibaba Cloud DocMind intelligent document parsing tool. Supports PDF, Word, PPT, Excel, images and more, outputting structured Markdown/JSON/HTML. Offers two invocation modes — V2 API direct access and Alibaba Cloud POP — with automatic routing based on credential availability. Use when the user needs to parse documents, extract content (text/tables/images), convert documents to Markdown, or mentions "docmind", "document parsing", "parse file", etc.
openclaw skills install @sdk-team/alibabacloud-docmind-parseDOCMIND_V2_ENDPOINT environment variable; limited daily free quota.DOCMIND_V2_ENDPOINT is configured, use V2 Free Mode.| Variable | Description | Required |
|---|---|---|
DOCMIND_V2_ENDPOINT | V2 API service endpoint (domain or IP). Defaults to docmind.aliyuncs.com | Optional |
POP Mode automatically obtains credentials through the Alibaba Cloud default credential chain (environment variables, config files, ECS RAM roles, etc.) — no manual management needed.
python scripts/docmind_parse.py <file_path_or_url> [options]
| Parameter | Description | Default |
|---|---|---|
<file_path_or_url> | Local file path or file URL | Required |
--mode | Invocation mode: auto, v2, pop | auto |
--enhancement | Enhancement mode: VLM, LLM, DIGITAL, OCR, AUTO | None |
--output | Output format: markdown, json, html | markdown |
--pages | Page range to parse, e.g. 1-5 | All |
--output-file | Output file path | Stdout |
--head-foot | Parse headers and footers | Off |
--user-prompt | Custom user prompt | None |
--option | Document parsing options | None |
--markdown-table | Table output format: html, markdown | None |
--markdown-image | Image output format: html, markdown | None |
--file-ext | File extension (alternative to fileName) | Auto-detected |
# Parse a URL (auto-select mode)
python scripts/docmind_parse.py https://example.com/doc.pdf
# Parse with VLM enhancement
python scripts/docmind_parse.py https://example.com/doc.pdf --enhancement VLM
# Parse the first 5 pages, output to a Markdown file
python scripts/docmind_parse.py ./report.pdf --pages 1-5 --output-file result.md
# Parse a local file via V2 mode (auto base64 encoding)
python scripts/docmind_parse.py ./contract.pdf --mode v2
# Parse with custom table/image output formats
python scripts/docmind_parse.py https://example.com/doc.pdf --markdown-table markdown --markdown-image html
# Parse headers and footers with a custom prompt
python scripts/docmind_parse.py https://example.com/doc.pdf --head-foot --user-prompt "Extract all footnotes"
# Force Alibaba Cloud POP mode
python scripts/docmind_parse.py ./contract.pdf --mode pop
Supports both URL and local file (base64) upload. The request body is organized into four blocks: Document, Processing, Output, and Notification.
POST {DOCMIND_V2_ENDPOINT}/skill/submit
Full request schema:
{
"document": {
"fileUrl": "https://example.com/doc.pdf",
"fileBase64": "<base64-encoded file content, alternative to fileUrl>",
"fileName": "doc.pdf",
"fileNameExtension": "pdf"
},
"processing": {
"enhancementMode": "VLM",
"pageIndex": "1-5",
"headFoot": false,
"userPrompt": "Custom prompt",
"option": "parsing-option"
},
"output": {
"outputFormat": ["markdown"],
"markdownTable": ["html"],
"markdownImage": ["html"],
"docExtraParameters": {"key": "value"},
"extraParameters": "audio-video-extra-params",
"ossConfig": {
"bucket": "my-bucket",
"endpoint": "oss-cn-hangzhou.aliyuncs.com",
"accessKeyId": "...",
"accessKeySecret": "...",
"securityToken": "..."
}
},
"notification": {
"enableEventCallback": false
}
}
document.fileUrlanddocument.fileBase64are mutually exclusive. When parsing a local file via V2 mode, the script automatically reads and base64-encodes the file.fileNameExtensionis auto-detected from the file extension when not explicitly provided.
Response:
{
"success": true,
"data": { "bizId": "docmind-20260519-xxxx" }
}
POST {DOCMIND_V2_ENDPOINT}/skill/query
{
"bizId": "docmind-20260519-xxxx",
"layoutStepSize": 100,
"layoutNum": 0
}
Response (on success):
{
"success": true,
"data": {
"status": "success",
"processing": 100.0,
"layouts": [ ... ]
}
}
Rate limiting: max 5 tasks per second per IP, global limit 20.
Three-step async workflow using the default credential chain to initialize the client:
SubmitDocParserJob / SubmitDocParserJobAdvanceQueryDocParserStatus (poll until success/fail)GetDocParserResult (incremental retrieval via LayoutNum + LayoutStepSize pagination)POP endpoint: docmind-api.cn-hangzhou.aliyuncs.com, API version: 2022-07-11
from alibabacloud_credentials.client import Client as CredClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_docmind_api20220711.client import Client as DocMindClient
cred = CredClient()
config = open_api_models.Config(
credential=cred,
endpoint="docmind-api.cn-hangzhou.aliyuncs.com",
user_agent="AlibabaCloud-Agent-Skills/alibabacloud-docmind-parse/" + os.environ.get("SKILL_SESSION_ID", "unknown")
)
client = DocMindClient(config)
| Mode | Free Quota | After Exhaustion |
|---|---|---|
| V2 API Direct | Limited daily quota | Prompt to activate Alibaba Cloud service |
| Alibaba Cloud POP | 3,000 pages/month | Automatic pay-as-you-go |
When quota is exhausted, prompt the user to visit https://docmind.console.aliyun.com/doc-overview to activate the service.
| Error Code | Meaning | Resolution |
|---|---|---|
QuotaExhausted / Throttling | Quota exhausted or rate-limited | Prompt to activate Alibaba Cloud service |
FileUrlLegal | Invalid file URL | Verify the URL is publicly accessible |
InvalidFileFormat | Unsupported file format | Show the list of supported formats |
FileSizeExceeded | File too large | V2 limit 5 MB, POP limit 150 MB |
OssAccessDeniedError / HTTP 403 | URL points to a private or restricted OSS resource | See pre-validation rules below |
Before invoking the script, the Agent MUST perform the following checks:
Local file path validation: If the input is a local file path, verify that the file exists and is readable before calling the script. If the file does not exist, stop immediately and ask the user to verify the path. The Agent must NEVER create, fabricate, or substitute a file to bypass a missing-file error.
Private URL detection: If the input is a URL and V2 mode returns OssAccessDeniedError or HTTP 403, the URL is likely private or requires authentication. In this case:
--mode v2 handles base64 encoding transparently).Network unreachable: If the script fails with a connection error (e.g. ConnectionError, ConnectionRefused), check that:
DOCMIND_V2_ENDPOINT (default docmind.aliyuncs.com) is reachable from the current environment.Each layout block's markdownContent field is concatenated; tables are embedded as HTML tables.
Raw layouts structured data, including type/subType, text, markdownContent, pageNum, index, pos and other fields.
All outbound HTTP requests (V2 API and Alibaba Cloud POP SDK) set the following User-Agent header:
AlibabaCloud-Agent-Skills/alibabacloud-docmind-parse/{session-id}
The {session-id} segment is read from the SKILL_SESSION_ID environment variable at runtime. If the variable is not set, the value falls back to unknown.
| Component | Value |
|---|---|
| UA template | AlibabaCloud-Agent-Skills/alibabacloud-docmind-parse/{session-id} |
| session-id source | SKILL_SESSION_ID env var |
| Fallback | unknown |
| Env var example | export SKILL_SESSION_ID=agent-abc123-session-xyz |
All skill invocations sharing the same SKILL_SESSION_ID value are correlated as a single session in server-side logs.
# POP mode
pip install "alibabacloud-docmind-api20220711>=1.0.0" \
"alibabacloud-credentials>=1.0.0" \
"alibabacloud-tea-openapi>=0.3.8" \
"alibabacloud-tea-util>=0.3.0" \
"alibabacloud-gateway-pop>=0.1.0"
# V2 mode
pip install "requests>=2.20.0"