Knowledge Importer

v1.3.0

将 Word/Excel/PPT/PDF/MD 等格式的文档转换为 Markdown 格式,并保存到 Obsidian 知识库。图片可上传到图床,生成外部 URL 链接。当用户需要:1) 导入文档到知识库 2) 将文件转换为 MD 格式 3) 提取文档内容并保留图片时使用此技能。

1· 112·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for sunshinegw/knowledge-importer.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Knowledge Importer" (sunshinegw/knowledge-importer) from ClawHub.
Skill page: https://clawhub.ai/sunshinegw/knowledge-importer
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install knowledge-importer

ClawHub CLI

Package manager switcher

npx clawhub@latest install knowledge-importer
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (document -> Markdown, save to Obsidian, upload images) align with the included script: import_doc.py parses docx/pptx/pdf/xlsx/md, extracts images and uploads them to a configurable image host, and writes Markdown files into a user-specified knowledge base path.
Instruction Scope
SKILL.md instructs running the local import script, configuring DUFS_SERVER_URL and KNOWLEDGE_BASE_PATH, and using the Obsidian CLI (via xvfb-run). The instructions cause network PUT uploads to the configured image host and write files into the provided vault path — this is expected but means the image host you specify will receive uploaded images, so only set it to a trusted endpoint and avoid importing sensitive images.
Install Mechanism
No install spec (instruction-only with a bundled script). Dependencies are standard Python packages listed in README/SKILL.md (python-docx, python-pptx, openpyxl, pdfplumber). No third-party downloads or obscure URLs in the bundle.
Credentials
The skill requires configuring DUFS_SERVER_URL and KNOWLEDGE_BASE_PATH (documented in SKILL.md/README) but the registry metadata listed no required env vars — minor metadata omission. No unexpected credentials are requested.
Persistence & Privilege
always:false and user-invocable:true. The skill does not request permanent platform presence or modify other skills; it writes files only to the user-specified knowledge base path and temporary asset dirs.
Assessment
This skill appears to do what it says: convert docs to Markdown and upload any extracted images to the image host you configure. Before using it: 1) Only set DUFS_SERVER_URL to a server you trust (images will be PUT there); do not point it at unknown or public endpoints if images may contain sensitive info. 2) Set KNOWLEDGE_BASE_PATH carefully to avoid overwriting important files. 3) Ensure dependencies (python-docx, python-pptx, openpyxl, pdfplumber) and the Obsidian CLI/xvfb-run are installed if you intend to use CLI features. 4) Review scripts/config.py (or config.py.example) to confirm upload path, retry behavior, and timeout meet your safety needs. If you want greater assurance, inspect the full import_doc.py (already bundled) and do a test run on non-sensitive documents first.

Like a lobster shell, security has layers — review code before you run it.

latestvk974vxt3x1mbk8yjmx3bnfkm0584cq9x
112downloads
1stars
3versions
Updated 3w ago
v1.3.0
MIT-0

Knowledge Importer

将各种格式的文档转换为 Markdown 并保存到知识库。

环境配置

首次使用前,请配置以下环境变量或修改 scripts/config.py

# 图床服务地址
export DUFS_SERVER_URL="http://你的服务器IP:端口"

# 知识库存放路径
export KNOWLEDGE_BASE_PATH="/你的/Obsidian/路径"

支持的格式

格式扩展名依赖库图片处理
Word.docxpython-docx✅ 图床/Base64
Excel.xlsx / .xlsopenpyxl-
PPT.pptxpython-pptx✅ 图床/Base64
PDF.pdfpdfplumber✅ 图床/Base64
Markdown.md原生支持-

图片处理方式

方式一:图床上传(推荐)

配置图床服务器后,图片会上传到图床生成外部 URL:

DUFs_CONFIG = {
    "server_url": "http://你的服务器IP:端口",
    "timeout": 30,
    "retry_times": 3,
}

上传路径:http://你的服务器IP:端口/Picture/<uuid>.png

方式二:Base64 内嵌(备用)

如果图床不可用,自动降级为 Base64 内嵌方式:

![image](data:image/png;base64,iVBORw0KGgo...)

Obsidian CLI 前提条件

  1. 在 Obsidian 中启用 CLI

    • Settings → General → Command line interface → 启用
    • 按照提示完成注册
  2. CLI 命令格式

    xvfb-run obsidian create name="文件名" content="内容"
    xvfb-run obsidian append file="文件" content="内容"
    

知识库目录结构

默认路径:$KNOWLEDGE_BASE_PATH(见环境配置)

目录结构(两级分类)

知识库/
├── 申报方案/
│   └── <行业/产品>/
├── 解决方案/
│   └── <行业/产品>/
├── 技术文档/
│   └── <行业/产品>/
└── <其他分类>/

分类原则

  • 申报方案/:申报书、投标书、建设方案申请等
  • 解决方案/:面向客户的解决方案文档
  • 技术文档/:产品使用经验、技术部署文档
  • <其他分类>/:根据需要自定义

使用方式

1. 单文件转换

将 /path/to/document.docx 导入知识库

2. 指定输出目录

将文件导入到 [目录名]

转换规则

  • 文件名:保留原文件名(去掉扩展名)
  • 图片:Word/PPT/PDF 中的图片会提取并上传图床
  • 表格:Excel/PDF 中的表格会保持 Markdown 格式

执行脚本

# 进入脚本目录
cd skills/knowledge-importer/scripts

# 配置环境变量(或修改 config.py)
export DUFS_SERVER_URL="http://192.168.1.100:5000"
export KNOWLEDGE_BASE_PATH="/path/to/Obsidian"

# 单文件转换
python3 import_doc.py /path/to/document.docx

# 批量转换
python3 import_doc.py --batch /path/to/folder

# 查看帮助
python3 import_doc.py --help

依赖安装

pip3 install python-docx python-pptx openpyxl pdfplumber

图床推荐

  • Dufs:轻量文件服务器,支持上传 API
    • Docker 部署:docker run -v /path:/data -p 5000:5000 sigoden/dufs
  • PicList:支持多种图床
  • 兰空图床:自建图床解决方案

Comments

Loading comments...