Install
openclaw skills install convert-document-to-markdownConvert supported local files into Markdown by running this repository's Dockerized file-only CLI. This skill must run through Docker with a prebuilt Aliyun CR image selected by host architecture and fixed version, not through a local Python runtime.
openclaw skills install convert-document-to-markdownUse this skill when a user wants a supported local file converted into Markdown for later processing.
.pdf, .docx, .pptx, .xlsx, .jpg, .jpeg, .png, .gif, .bmp, .txt, .json, .xml, .mdocr / vl / none for .docx, .pptx, .xlsx, and image files;
ocr / vl / vl-page / none for .pdf0.0.1:
convert-document-to-markdown-arm64:0.0.1 on ARM64 hosts,
convert-document-to-markdown-x64:0.0.1 on x64 hostsmarkdown, logs, and meta..env file, then forwards it into the container automatically.file command. URL, health, and version commands are intentionally removed to keep startup lean.latest, do not build a fallback image at runtime, and do not treat .doc, .ppt, .xls, audio files, or unlisted image formats as supported inputs.crpi-4auaoyyj6r36p6lb.cn-hangzhou.personal.cr.aliyuncs.com/huozige_lab.convert-document-to-markdown-arm64:0.0.1 or convert-document-to-markdown-x64:0.0.1.IMAGE_REGISTRY or IMAGE_NAME.scripts/run_docker_cli.sh file <absolute-or-relative-path> --format jsonsuccess is false, surface error.message and relevant logs.success is true, use markdown as the canonical output for downstream work.This skill is designed so the user does not need to re-enter Vision API settings on each run.
Preferred OpenClaw configuration in ~/.openclaw/openclaw.json:
{
"skills": {
"entries": {
"convert_document_to_markdown": {
"enabled": true,
"apiKey": "sk-xxx",
"env": {
"VL_BASE_URL": "https://api.openai.com/v1",
"VL_MODEL": "gpt-4.1-mini"
}
}
}
}
}
This works because:
skillKey is convert_document_to_markdownprimaryEnv is VL_API_KEY, so apiKey maps to VL_API_KEYenv can hold VL_BASE_URL and VL_MODELRepository-local runtime configuration:
.env.example to .envVL_BASE_URL, VL_API_KEY, and VL_MODELcrpi-4auaoyyj6r36p6lb.cn-hangzhou.personal.cr.aliyuncs.com/huozige_labIMAGE_REGISTRY or IMAGE_NAMEscripts/run_docker_cli.sh, which loads .env, forwards any host VL_* variables into docker run, and pulls the correct fixed-version image if missingLocal file:
scripts/run_docker_cli.sh file ./notes.pdf --image-process-model ocr --format json
--image-process-model ocr
Default mode. Use Tesseract OCR for images.--image-process-model vl
Use a Vision API. Only choose this when the environment provides VL_API_KEY and related variables.--image-process-model none
Skip image recognition for speed.--image-process-model vl-page
PDF only. Do not use this mode for Office documents or image files.--format json|markdown
Use json unless the user explicitly wants raw Markdown on stdout.--output <path>
Save the Markdown to a file. Prefer this only when you invoke docker run directly with a writable host mount.--log-file <path>
Save detailed logs to a file. Prefer this only when you invoke docker run directly with a writable host mount.uv, python, or any other local runtime path for production use.IMAGE_ARCH only when you have a concrete reason.IMAGE_REGISTRY plus the fixed version 0.0.1; only use IMAGE_NAME when you need to pass the full image reference explicitly.VL_BASE_URL, VL_API_KEY, and VL_MODEL are already configured via OpenClaw skill config or .env.markdown field..doc, .ppt, .xls, .wav, .mp3, .m4a, or .mp4, say the current skill does not reliably support it.success: true.