Llmwiki

Other

LLM-powered personal knowledge base. Raw documents in, an LLM compiles them into a structured interlinked wiki with trilingual articles, emergent taxonomy, and self-healing. One operations contract serves CLI, HTTP, and MCP.

Install

openclaw skills install llmwiki

llmwiki

A personal knowledge base that an LLM compiles, not just stores. Raw documents go in, an LLM writes trilingual (EN / 中文 / 日本語) wiki articles with [[wiki-links]], backlinks, and an emergent taxonomy. The MCP server dispatches every tool through llmwiki/operations.py; the CLI exposes the same registry via llmbase ops call; individual HTTP/CLI wrappers are being migrated onto the registry over time.

PyPI: pip install llmwiki
CLI command: llmbase (the package name and the command differ)
GitHub: https://github.com/Hosuke/llmbase
Demo: https://huazangge-production.up.railway.app

Setup

pip install llmwiki

mkdir my-kb && cd my-kb

cat > .env << 'EOF'
LLMBASE_API_KEY=sk-your-key
LLMBASE_BASE_URL=https://your-endpoint/v1
LLMBASE_MODEL=your-model
# Optional: LLMBASE_FALLBACK_MODELS=backup-1,backup-2
EOF

cat > config.yaml << 'EOF'
llm:
  max_tokens: 16384
paths:
  raw: "./raw"
  wiki: "./wiki"
EOF

Commands

Command	Description
`llmbase ingest url <url>`	Ingest a web article
`llmbase ingest pdf <file>`	Ingest a PDF (auto-chunks)
`llmbase ingest file <file>`	Ingest any local file
`llmbase ingest dir <dir>`	Ingest all files from a directory
`llmbase ingest cbeta-learn --batch 10`	Corpus plugin: Buddhist canon
`llmbase ingest ctext-book 论语 /analects/zh`	Corpus plugin: Chinese classics
`llmbase compile new`	Compile new raw docs incrementally (3-layer dedup)
`llmbase compile all`	Full rebuild
`llmbase compile index`	Rebuild index + aliases
`llmbase query "<q>"`	Ask a question (single-pass; add `--deep` for multi-step research)
`llmbase query "<q>" --tone wenyan`	📜 classical Chinese voice
`llmbase query "<q>" --tone scholar`	🎓 academic voice
`llmbase query "<q>" --tone eli5`	👶 simple voice
`llmbase query "<q>" --tone caveman`	🦴 primitive voice
`llmbase query "<q>" --file-back`	File answer back into the wiki
`llmbase lint check`	8-category structural health check
`llmbase lint heal`	Check → fix → re-check → report
`llmbase lint deep`	LLM deep quality analysis
`llmbase web`	Web UI at :5555
`llmbase serve`	Agent HTTP API at :5556
`llmbase mcp`	Start MCP server (stdio)
`llmbase stats`	KB statistics

MCP Integration (for AI clients)

{
  "mcpServers": {
    "llmwiki": {
      "command": "python",
      "args": ["-m", "llmwiki", "--base-dir", "/path/to/my-kb"]
    }
  }
}

Tools exposed by the MCP server:

Tool	Purpose
`kb_search`	Full-text search over compiled concepts
`kb_search_raw`	Verbatim full-text fallback over raw/ sources (v0.6.2+)
`kb_ask`	Deep-research Q&A with tone modes
`kb_get`	Get article by slug or alias (`空`, `kong`, `emptiness` all work)
`kb_list`	List articles, filter by tag
`kb_backlinks`	Find articles citing a given article
`kb_taxonomy`	Multilingual category tree
`kb_stats`	Article count, word count
`kb_xici`	Guided reading (导读)
`kb_ingest`	Ingest a URL
`kb_compile`	Compile raw → wiki
`kb_lint`	Health check / auto-fix
`kb_export` / `kb_export_article` / `kb_export_tag` / `kb_export_graph`	Structured export for downstream projects

All tools are declared in llmwiki/operations.py — downstream projects register custom ops via operations.register(...) and they become available on CLI + MCP automatically.

Agents mounted on this server can answer from compiled concepts, fall back to raw sources with kb_search_raw when compile glossed a detail, ingest new material mid-session, and trigger healing.

Workflows

Build a KB from scratch

llmbase ingest url https://example.com/topic
llmbase ingest pdf ./paper.pdf
llmbase compile new
llmbase query "What are the key concepts?"
llmbase lint heal

Autonomous mode (deploy once, server keeps learning)

# config.yaml
worker:
  enabled: true
  learn_source: cbeta         # built-in: cbeta | wikisource | both; custom via register_learn_source()
  learn_interval_hours: 6
  compile_interval_hours: 1
  health_check_interval_hours: 24

health:
  auto_fix_broken_links: true
  max_stubs_per_run: 10

The worker starts under the production WSGI entrypoint (wsgi.py → start_worker_thread). Deploy with gunicorn wsgi:app; llmbase web alone does not self-start the worker.

Daily use as agent memory

Agent receives a task → calls kb_search for relevant concepts
If the compiled answer is too abstract → calls kb_search_raw for verbatim detail
Learns something new → calls kb_ingest with the URL
Optionally kb_compile to fold it into concepts for next session
Periodically kb_lint heals the graph

Key Concepts

Synthesis, not archiving — LLM reads raw material and writes composed articles; storage is the cheap part
Two-layer recall — kb_search (concepts) + kb_search_raw (verbatim raw sources)
Trilingual default — every article has EN / 中文 / 日本語 sections
叠加进化 — new data merges into existing concepts, never overwrites
Domain-agnostic — taxonomy emerges per-domain, nothing hardcoded
Self-healing — 7-step auto-fix pipeline repairs drift
Alias resolution — [[参禅]] → can-chan.md across scripts and simplified/traditional
Registry-backed ops — MCP dispatches every tool through operations.py; CLI exposes the same registry via llmbase ops list / llmbase ops call; direct HTTP/CLI wrappers are being migrated onto the registry

Tips

--file-back saves Q&A answers into the wiki so future queries benefit
--tone wenyan for Chinese users (classical Chinese responses)
Run llmbase lint heal after large ingestion batches
Web UI /health has buttons for every repair op
Knowledge graph at /graph — density slider for large KBs
Timeline at /explore — requires entities: { enabled: true } in config

Security & Privacy

All data stays local — wiki files are plain markdown on your filesystem
LLM API key — user-supplied, loaded from .env
Network access — user-initiated (URL ingest, SSRF-protected) plus corpus plugins (cbeta-learn, wikisource-learn, ctext-book) and the autonomous worker when enabled
Web server — optional; binds 0.0.0.0 so LAN-accessible by default — front with a reverse proxy or bind override for public exposure
API secret — cloud deployments (with PORT env) gate most mutating endpoints behind LLMBASE_API_SECRET (auto-generated if unset). Note: /api/ask is open by default and writes Q&A back via file_back; only promotion to concepts requires the secret
Autonomous worker — opt-in via config, disabled by default
No telemetry — nothing is sent anywhere except the configured LLM API