Install
openclaw skills install universal-document-ingestion-routerDocument parsing and knowledge-base import router.
openclaw skills install universal-document-ingestion-routerUse this skill whenever a task involves document parsing, document ingestion, knowledge-base import preparation, or routing files to suitable parsers.
Short mental name: doc-router.
Chinese trigger phrases:
This skill only does:
It does not implement vector indexing, database sync, retrieval orchestration, corpus governance, or domain-specific decision logic.
Agents should consider this skill automatically when building or modifying systems that need to ingest files into a knowledge base, including:
If the user says anything like "把这些文件集成到知识库", "解析这些文件", "做文档入库", "研报内容检索", or "系统需要读取上传的文档", use this skill as the front-end classifier/router before downstream indexing.
Run from this skill directory or use the script path directly:
python scripts/document_classifier_router.py capabilities
python scripts/document_classifier_router.py classify --input path/to/file.pdf
python scripts/document_classifier_router.py parse --input path/to/file.pdf --output out/parsed
python scripts/document_classifier_router.py batch --input-dir path/to/files --output out/batch --copy-sources
document.json: canonical parsed manifest, always emitted for parse attempts.document.md: readable normalized content when extraction succeeds.chunks.jsonl: retrieval-ready chunks when chunking is enabled.tables/: only when reliable tables are extracted.batch_summary.json: emitted by batch mode.markitdown, fallback pymupdf, fallback pypdf.PaddleOCR, else dependency recommendation.markitdown, fallback python-docx.markitdown, fallback python-pptx.openpyxl or built-in CSV extraction..doc/.ppt/.xls: recommend LibreOffice when unavailable.--copy-sources to parse copied samples.blocked_or_failed or warnings rather than pretending success.This skill is intentionally a plain CLI script with JSON output so OpenClaw, Hermes, Codex, Claude Code, or any other agent can call it through a shell/process runner without OpenClaw-specific APIs.
For agents that do not load skills by name, use the short alias doc-router and point them to:
skills/universal-document-ingestion-router/scripts/document_classifier_router.py
Read references/development-report.md for implementation/test results and references/architecture.md for the boundary and adapter model.