文档问答助手

v1.0.0

文档问答助手。基于本地文档（PDF/Word/Markdown/TXT）回答问题，支持知识库检索和多文档交叉验证。当用户需要：从文档中查找答案、基于文档回答问题、跨多个文档综合查询、验证信息一致性、生成文档摘要时使用此技能。

⭐ 0· 151·0 current·0 all-time

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for xuyongliang-eccom/document-qa-assistant.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "文档问答助手" (xuyongliang-eccom/document-qa-assistant) from ClawHub.
Skill page: https://clawhub.ai/xuyongliang-eccom/document-qa-assistant
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install document-qa-assistant

ClawHub CLI

Package manager switcher

npx clawhub@latest install document-qa-assistant

Security Scan

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

Purpose & Capability

The README (SKILL.md) promises features (indexing script, Word/CHM/.docx support, multi-format parsing) that the included code does not implement: index_docs.py is referenced but not present, and scripts/doc_qa.py only handles .txt/.md and a fragile .pdf path; Word and CHM handling are not implemented. This mismatch suggests sloppy packaging or missing files.

Instruction Scope

Instructions tell the agent/user to run local scripts on arbitrary document directories (expected), but the PDF parsing uses a subprocess that injects the file path into a python -c string (f"...'{path}'...") which can break/lead to code injection if a path contains quotes or unexpected characters. The search implementation is also flawed: it treats the question as characters rather than words, so semantic retrieval does not work for typical inputs (especially Chinese).

ℹ

Install Mechanism

No install spec (instruction-only) — lowest risk. However, the code calls pdfplumber via a subprocess without declaring pdfplumber as a dependency; if pdfplumber is absent the script returns '[PDF解析不可用]'. Missing dependency documentation is an operational issue (not an install safety issue) but reduces reliability.

✓

Credentials

The skill requests no environment variables, no credentials, and no config paths — consistent with a local document QA tool that operates on user-supplied files.

✓

Persistence & Privilege

always is false and the skill does not request persistent or privileged system configuration. It does not modify other skills. No persistence concerns detected.

What to consider before installing

This skill is not clearly malicious, but it has multiple coherence and safety issues you should understand before running it. - Missing file: SKILL.md references scripts/index_docs.py (indexing) but that file is not included — the tool may be incomplete. - PDF parsing: the script tries to call pdfplumber via a python -c subprocess with the document path interpolated directly into the command string; that can break on paths with quotes and could be abused if an attacker controls file paths. Consider avoiding running on untrusted directories until the code is fixed. - Unsupported formats: SKILL.md claims Word/CHM support, but the script does not implement .docx/.doc/.chm parsing; running it on those files will likely fail or return '[无法读取: ...]' / placeholder text. - Search bug: the search logic treats the question as characters, not tokens/words, so it will often return no matches (especially for single-character languages like Chinese). Expect incorrect or empty results. - Dependencies: pdfplumber and any other parsing libs are not declared; install and audit them if you intend to use PDF functionality. Recommendations: 1. If you want to use this, review and fix scripts/doc_qa.py (escape/avoid constructing python -c with untrusted paths; implement proper PDF/docx parsing using libraries and declare dependencies; fix tokenization/search logic). 2. Do not run the script over directories containing files with untrusted filenames or content until the subprocess call and path handling are hardened. 3. Prefer a packaged/repo source or contact the author for the missing index script and dependency list. If you are unsure, treat the package as untrusted and inspect/fix the code locally before use.

Like a lobster shell, security has layers — review code before you run it.

aivk976x42tvb8pp4sf6wm3wv86mx83npb6documentvk976x42tvb8pp4sf6wm3wv86mx83npb6knowledgevk976x42tvb8pp4sf6wm3wv86mx83npb6latestvk976x42tvb8pp4sf6wm3wv86mx83npb6qavk976x42tvb8pp4sf6wm3wv86mx83npb6

151downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

Document QA Assistant

文档问答助手。基于本地文档回答问题，支持多文档检索和交叉验证。

核心能力

文档解析 — 支持 PDF、Word、Markdown、TXT、CHM
语义检索 — 基于内容理解回答问题，不依赖关键词匹配
多文档交叉 — 跨多个文档综合答案
答案溯源 — 指出答案来自哪个文档的第几部分

快速开始

问答

python3 scripts/doc_qa.py --docs "./docs/" --question "这个产品的价格是多少"

批量索引文档

python3 scripts/index_docs.py --dir "./knowledge-base/" --output ./index.json

脚本说明

scripts/doc_qa.py

基于文档回答问题。

python3 scripts/doc_qa.py --docs <文档路径> --question "<问题>" [--context-len <行数>]

参数：

--docs: 文档文件或目录路径
--question: 要回答的问题
--context-len: 参考上下文行数（默认 20）

scripts/index_docs.py

批量索引文档供后续检索。

python3 scripts/index_docs.py --dir <目录> --output <输出索引文件>

典型场景

从知识库中查找答案

python3 scripts/doc_qa.py \
  --docs "./knowledge-base/metaworks/" \
  --question "微应用开发需要哪些前置条件"

生成文档摘要

python3 scripts/doc_qa.py \
  --docs "./docs/product-guide.pdf" \
  --question "请用100字总结这份文档的核心内容"

跨文档验证信息

python3 scripts/doc_qa.py \
  --docs "./docs/v1/,./docs/v2/" \
  --question "两个版本的配置方式有什么不同"

输出格式

{
  "answer": "答案内容...",
  "sources": [
    {
      "file": "文档路径",
      "relevance": 0.95,
      "excerpt": "相关段落..."
    }
  ],
  "confidence": "high|medium|low"
}

Comments

Loading comments...