Junyi Doc Reader
ReviewAudited by ClawScan on May 10, 2026.
Overview
This is mostly a coherent document-archiving skill, but its optional LLM mode has a privacy-contract mismatch that can send document chunks to a default external endpoint, and Feishu mode uses local app credentials.
Install only if you are comfortable with its document outputs being stored in your chosen vault and with Feishu mode using local Feishu app credentials. Keep DOC_READER_ALLOW_EXTERNAL=false for offline use; if you enable LLM insights, explicitly set DOC_READER_API_URL and DOC_READER_MODEL so you know exactly where document chunks are sent.
Findings (5)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
A user may believe they must explicitly choose the LLM endpoint, while the skill can fall back to a default external provider when enrichment is enabled.
This privacy claim is contradicted by the included code: `scripts/enricher.py` defines a default OpenAI URL and `scripts/pipeline.py` can enable insights with only an API key and `DOC_READER_ALLOW_EXTERNAL=true`. That mismatch can mislead users about where document content may go.
启用 LLM 增强需要用户主动设置全部四个环境变量 ... `DOC_READER_API_URL`(用户指定 endpoint,无硬编码生产 URL)
Require DOC_READER_API_URL and DOC_READER_MODEL explicitly before enrichment, or change the privacy text to clearly state the default endpoint and model.
Private document chunks can leave the local machine in enrichment mode, potentially to a provider the user did not explicitly configure.
When LLM enrichment is enabled, chunk text is sent to an LLM API endpoint, and the endpoint defaults to OpenAI if DOC_READER_API_URL is not set.
DEFAULT_API_URL = "https://api.openai.com/v1/chat/completions" ... "content": USER_PROMPT_TEMPLATE.format(text=text) ... urllib.request.urlopen(req, timeout=60)
Leave DOC_READER_ALLOW_EXTERNAL=false unless external analysis is intended, and set DOC_READER_API_URL explicitly to the desired provider before using insights mode.
Using Feishu mode gives the skill access to Feishu app credentials for the selected account.
Feishu mode reads local app credentials to obtain an access token. This is purpose-aligned for fetching Feishu documents, but it is sensitive account authority.
仅 `~/.openclaw/openclaw.json` 中 `channels.feishu.accounts[<account>].appId` / `.appSecret` 两个字段 ... 用于换取 Feishu tenant_access_token
Use a least-privilege Feishu app/account and verify the requested `--account` before running Feishu imports.
Archived content may be reused by agents later if the output directory is part of an Obsidian vault or other searchable knowledge base.
The skill intentionally persists source text, chunks, and indexes for later agent use. This is expected for document archiving, but it means private or untrusted document content may remain available in future retrieval contexts.
`chunks.jsonl` — 精确检索定位 ... `source.md` — 需要全文搜索时使用 ... `ROOT_INDEX.md` — 先读这个了解文档结构
Store outputs only in a vault/location appropriate for the document sensitivity, and review generated indexes before relying on them.
The skill depends on the behavior and trustworthiness of local pandoc/pdftotext installations when processing documents.
The converter runs local document-conversion binaries. This is central to the stated purpose and uses argument lists rather than shell execution, but users should know local tools are invoked.
subprocess.run(["pandoc", "-f", "docx", "-t", "markdown", ...]) ... subprocess.run(["pdftotext", "-layout", input_path, "-"], ...)
Use trusted, up-to-date converter binaries and avoid processing untrusted documents in a sensitive environment.
