Knowledge Retrieval Publish
PassAudited by VirusTotal on May 12, 2026.
Overview
Type: OpenClaw Skill Name: knowledge-retrieval-publish Version: 3.1.6 The knowledge-retrieval skill is a local-first document search tool that indexes and retrieves content from various file formats (PDF, PPTX, DOCX, etc.) using a combination of BM25 and AI semantic matching. The provided Python scripts (build_kb_index.py and search_kb.py) handle text extraction and indexing locally, storing data within a designated workspace directory. Analysis of the code and the extensive markdown instructions (SKILL.md and references/) reveals no evidence of data exfiltration, malicious execution, or harmful prompt injection; the instructions are strictly focused on task performance, anti-hallucination, and maintaining local data privacy.
Findings (0)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Private document pages or images could be processed by a different AI provider than the user expected.
Image/document analysis may be routed automatically to other configured models or spawned multimodal subagents, but the artifacts do not clearly define provider identity, approval, or data boundaries.
本 SKILL 可能会根据 agent 的模型配置自动 fallback 到其他模型... 执行视觉分析(当前模型自带 / spawn 多模态子代理均可,备选模型示例:Kimi K2.6、豆包、Gemini)
Require explicit user approval before using alternate models or subagents, show which provider will receive content, and provide an opt-out for external vision analysis.
Users handling confidential client materials may rely on a stronger local-only guarantee than the skill actually documents.
This privacy wording can lead users to believe document content never leaves the machine, while other artifacts describe AI semantic/vision analysis and automatic fallback to external model options.
Your original files, knowledge base index, and working caches stay on your local machine — no need to upload or store them on any external platform or cloud... client materials cannot be uploaded to third-party platforms.
Revise the privacy language to distinguish local storage/indexing from possible model-provider processing, and state when document content may be sent to an AI service.
Sensitive content from deleted or moved documents may remain searchable in the skill workspace until manually removed.
The skill stores searchable extracted text and image-analysis cache data persistently, and the documentation says residual data may remain after original files are deleted.
BM25 索引和图片分析缓存保存在 skill 工作目录中,不会随原文件删除而自动清除... 如需完全移除知识库的所有残留数据,请同时删除上述目录。
Make retention behavior prominent, provide an automatic purge for deleted source files, and offer a clear cleanup command for cache and index data.
A normal search may trigger local Python execution and indexing work without a separate prompt.
Searching can automatically execute the included index-building script when the index is missing. This is purpose-aligned but still executes local code and writes index/cache files.
if not index_dir.exists(): ... subprocess.run([sys.executable, build_script, "--project", project_name], capture_output=True, text=True, cwd=str(WORKSPACE))
Keep auto-rebuild behavior documented, and consider asking before the first rebuild on large or sensitive folders.
Users may need to install changing third-party packages or look for a missing setup helper outside the reviewed package.
The documentation recommends dependency installation and references a setup script that is not included in the provided file manifest; manual packages are mostly unpinned.
Option 1 (recommended): Run `scripts/setup.bat` from the skill directory — it auto-detects Python and installs all dependencies. Option 2: Manual install: `pip install bm25s pdfminer.six python-pptx`
Include the referenced setup script or remove the instruction, add an install spec, and pin dependency versions where practical.
