Install
openclaw skills install @congshengwu/personal-kb-liteLocal file knowledge base with LLM-powered indexing and Q&A. Scans a directory for documents (.txt, .md, .pdf, .docx, .xlsx, .csv), extracts metadata (summary, tags, table of contents) using the LLM, and answers questions by retrieving relevant files.
openclaw skills install @congshengwu/personal-kb-liteArguments: $ARGUMENTS
$ARGUMENTS is --alias <name>: set a custom command alias — save {"command_alias": "<name>"} into .kb-config.json (merge with existing fields), then reply: Command alias set to /<name>. Terminate.$ARGUMENTS is --alias reset: remove the command_alias field from .kb-config.json, then reply: Command alias reset to default /kb. Terminate.$ARGUMENTS is empty (user only typed /kb), execute the Indexing Workflow below$ARGUMENTS is not empty, treat it as a user question and execute the Q&A Workflow belowRead .kb-config.json from the current working directory using the Read tool
If the file exists and watch_dir is a non-empty string, use it as WATCH_DIR
If the file does not exist or watch_dir is empty, ask the user:
Please enter the absolute path of the directory to watch:
Save the user's input as WATCH_DIR, and write the following to .kb-config.json using the Write tool:
{"watch_dir": "<WATCH_DIR>"}
Verify the directory exists using Bash:
ls "<WATCH_DIR>" > /dev/null 2>&1 && echo "ok" || echo "not_found"
If it returns not_found, inform the user and terminate.
Read <WATCH_DIR>/.kb-meta.json using the Read tool.
files object (keyed by filename)files = {}List files recursively in the directory using Bash:
find "<WATCH_DIR>" -type f \( -name "*.txt" -o -name "*.md" -o -name "*.docx" -o -name "*.pdf" -o -name "*.xlsx" -o -name "*.csv" \) -not -name ".kb-meta.json" -print
For each file found, get its modification time using Bash:
stat -f "%m" "<filepath>" # macOS
stat -c "%Y" "<filepath>" # Linux
A file needs (re-)indexing if:
files object, ORindexed_at timestamp in the existing entryAlso, remove any entries from files whose files no longer exist on disk (stale cleanup).
Use the relative path from WATCH_DIR as the key (e.g., subdir/report.pdf).
If there are no files to process and no stale entries to clean, output:
No changes detected (X files already indexed)
Then terminate.
For each file, follow these steps:
Read the file using the method described in Appendix: File Reading Methods below, with MAX_CHARS = 8000.
If reading fails, record "error": "Read failed: <error message>" in the file's metadata entry and skip to the next file.
After reading the file content, generate the following fields:
summary: A 100-200 word summary describing the file's core content and purposetags: 3-8 classification tags as a JSON arraytoc: A list of the file's major section/chapter titles as a JSON array (max 20 items); return an empty array if no clear structure existsAdd the following entry to the files object:
"<filename>": {
"filename": "<filename>",
"summary": "<summary>",
"tags": ["tag1", "tag2"],
"toc": ["section1", "section2"],
"indexed_at": "<current time, ISO 8601 format>",
"file_size": <bytes, obtained via Bash wc -c>
}
Write the complete .kb-meta.json back to disk immediately after processing each file, to avoid losing progress if interrupted.
Show progress for each file, e.g.:
[1/3] Processing: contract_template.txt ... done
[2/3] Processing: product_manual.pdf ... done
[3/3] Processing: quotation.xlsx ... failed (read error)
Indexing complete
Total files scanned: X
Newly indexed: Y
Failed/skipped: Z
Metadata file: <WATCH_DIR>/.kb-meta.json
User question: $ARGUMENTS
Read .kb-config.json from the current working directory using the Read tool, and get the watch_dir field (referred to as WATCH_DIR)
If the file does not exist or watch_dir is empty, reply:
Knowledge base directory not configured. Please run
/kbfirst to set up the index.
Then terminate.
Read <WATCH_DIR>/.kb-meta.json using the Read tool
If the file does not exist or files is an empty object, reply:
Knowledge base index is empty. Please run
/kbfirst to build the index.
Then terminate.
Organize each entry in files into a single line of plain text:
File: contract_template.txt, Summary: Standard procurement contract template with rights and obligations..., Tags: contract, procurement, template, TOC: Article 1 Definitions, Article 2 Rights, Article 3 Breach
File: product_manual.pdf, Summary: Technical specifications and installation guide for Product A..., Tags: product, specs, TOC: Overview, Installation, FAQ
toc: show at most the first 15 items, comma-separatedtags: comma-separatedBased on the index summary and user question, determine which files may contain the answer.
Criteria:
If no files are possibly relevant, reply directly:
Sorry, no files related to your question were found in the knowledge base.
Current indexed file count: X Suggestion: Verify the question falls within the knowledge base scope, or run
/kbto update the index.
Then terminate.
Record the matched filenames as an internal variable MATCHED_FILES (do not show this list to the user).
For each filename in MATCHED_FILES, read the file using the method described in Appendix: File Reading Methods below (no truncation, omit MAX_CHARS).
If a file fails to read, skip it and continue with the remaining files.
Based on the content of the retrieved files, answer the user's question.
Requirements:
---
Sources:
[1] contract_template.txt
[2] product_manual.pdf
Only list files actually used in the answer; do not include matched but unused files.
Use the following methods based on file extension. All Bash commands pass the file path via the KB_FILE environment variable to avoid filename injection issues.
Optional parameter MAX_CHARS: if provided, truncate output to that many characters; if omitted, output the full content.
.txt / .md / .csv: Read using the Read tool at <WATCH_DIR>/<filename>. If MAX_CHARS is set, only read the first MAX_CHARS characters (approximately MAX_CHARS / 40 lines).
.docx: Run via Bash:
KB_FILE="<WATCH_DIR>/<filename>" python3 -c "
import docx, os
doc = docx.Document(os.environ['KB_FILE'])
text = '\n'.join(p.text for p in doc.paragraphs if p.text.strip())
limit = int(os.environ.get('KB_MAX_CHARS', '0'))
print(text[:limit] if limit else text)
"
.pdf: Run via Bash:
KB_FILE="<WATCH_DIR>/<filename>" python3 -c "
import pdfplumber, os
with pdfplumber.open(os.environ['KB_FILE']) as pdf:
text = '\n'.join(p.extract_text() or '' for p in pdf.pages)
limit = int(os.environ.get('KB_MAX_CHARS', '0'))
print(text[:limit] if limit else text)
"
.xlsx: Run via Bash:
KB_FILE="<WATCH_DIR>/<filename>" python3 -c "
import openpyxl, os
wb = openpyxl.load_workbook(os.environ['KB_FILE'], data_only=True)
rows = []
for ws in wb.worksheets:
for row in ws.iter_rows(values_only=True):
line = '\t'.join(str(v) for v in row if v is not None)
if line.strip():
rows.append(line)
text = '\n'.join(rows)
limit = int(os.environ.get('KB_MAX_CHARS', '0'))
print(text[:limit] if limit else text)
"
When MAX_CHARS is needed, prepend KB_MAX_CHARS="<value>" to the environment variables, e.g.:
KB_FILE="..." KB_MAX_CHARS="8000" python3 -c "..."