Llmwiki

Other

LLM-powered personal knowledge base. Raw documents in, an LLM compiles them into a structured interlinked wiki with trilingual articles, emergent taxonomy, and self-healing. One operations contract serves CLI, HTTP, and MCP.

Install

openclaw skills install llmwiki

llmwiki

A personal knowledge base that an LLM compiles, not just stores. Raw documents go in, an LLM writes trilingual (EN / 中文 / 日本語) wiki articles with [[wiki-links]], backlinks, and an emergent taxonomy. The MCP server dispatches every tool through llmwiki/operations.py; the CLI exposes the same registry via llmbase ops call; individual HTTP/CLI wrappers are being migrated onto the registry over time.

Setup

pip install llmwiki

mkdir my-kb && cd my-kb

cat > .env << 'EOF'
LLMBASE_API_KEY=sk-your-key
LLMBASE_BASE_URL=https://your-endpoint/v1
LLMBASE_MODEL=your-model
# Optional: LLMBASE_FALLBACK_MODELS=backup-1,backup-2
EOF

cat > config.yaml << 'EOF'
llm:
  max_tokens: 16384
paths:
  raw: "./raw"
  wiki: "./wiki"
EOF

Commands

CommandDescription
llmbase ingest url <url>Ingest a web article
llmbase ingest pdf <file>Ingest a PDF (auto-chunks)
llmbase ingest file <file>Ingest any local file
llmbase ingest dir <dir>Ingest all files from a directory
llmbase ingest cbeta-learn --batch 10Corpus plugin: Buddhist canon
llmbase ingest ctext-book 论语 /analects/zhCorpus plugin: Chinese classics
llmbase compile newCompile new raw docs incrementally (3-layer dedup)
llmbase compile allFull rebuild
llmbase compile indexRebuild index + aliases
llmbase query "<q>"Ask a question (single-pass; add --deep for multi-step research)
llmbase query "<q>" --tone wenyan📜 classical Chinese voice
llmbase query "<q>" --tone scholar🎓 academic voice
llmbase query "<q>" --tone eli5👶 simple voice
llmbase query "<q>" --tone caveman🦴 primitive voice
llmbase query "<q>" --file-backFile answer back into the wiki
llmbase lint check8-category structural health check
llmbase lint healCheck → fix → re-check → report
llmbase lint deepLLM deep quality analysis
llmbase webWeb UI at :5555
llmbase serveAgent HTTP API at :5556
llmbase mcpStart MCP server (stdio)
llmbase statsKB statistics

MCP Integration (for AI clients)

{
  "mcpServers": {
    "llmwiki": {
      "command": "python",
      "args": ["-m", "llmwiki", "--base-dir", "/path/to/my-kb"]
    }
  }
}

Tools exposed by the MCP server:

ToolPurpose
kb_searchFull-text search over compiled concepts
kb_search_rawVerbatim full-text fallback over raw/ sources (v0.6.2+)
kb_askDeep-research Q&A with tone modes
kb_getGet article by slug or alias (, kong, emptiness all work)
kb_listList articles, filter by tag
kb_backlinksFind articles citing a given article
kb_taxonomyMultilingual category tree
kb_statsArticle count, word count
kb_xiciGuided reading (导读)
kb_ingestIngest a URL
kb_compileCompile raw → wiki
kb_lintHealth check / auto-fix
kb_export / kb_export_article / kb_export_tag / kb_export_graphStructured export for downstream projects

All tools are declared in llmwiki/operations.py — downstream projects register custom ops via operations.register(...) and they become available on CLI + MCP automatically.

Agents mounted on this server can answer from compiled concepts, fall back to raw sources with kb_search_raw when compile glossed a detail, ingest new material mid-session, and trigger healing.

Workflows

Build a KB from scratch

llmbase ingest url https://example.com/topic
llmbase ingest pdf ./paper.pdf
llmbase compile new
llmbase query "What are the key concepts?"
llmbase lint heal

Autonomous mode (deploy once, server keeps learning)

# config.yaml
worker:
  enabled: true
  learn_source: cbeta         # built-in: cbeta | wikisource | both; custom via register_learn_source()
  learn_interval_hours: 6
  compile_interval_hours: 1
  health_check_interval_hours: 24

health:
  auto_fix_broken_links: true
  max_stubs_per_run: 10

The worker starts under the production WSGI entrypoint (wsgi.pystart_worker_thread). Deploy with gunicorn wsgi:app; llmbase web alone does not self-start the worker.

Daily use as agent memory

  1. Agent receives a task → calls kb_search for relevant concepts
  2. If the compiled answer is too abstract → calls kb_search_raw for verbatim detail
  3. Learns something new → calls kb_ingest with the URL
  4. Optionally kb_compile to fold it into concepts for next session
  5. Periodically kb_lint heals the graph

Key Concepts

  • Synthesis, not archiving — LLM reads raw material and writes composed articles; storage is the cheap part
  • Two-layer recallkb_search (concepts) + kb_search_raw (verbatim raw sources)
  • Trilingual default — every article has EN / 中文 / 日本語 sections
  • 叠加进化 — new data merges into existing concepts, never overwrites
  • Domain-agnostic — taxonomy emerges per-domain, nothing hardcoded
  • Self-healing — 7-step auto-fix pipeline repairs drift
  • Alias resolution[[参禅]]can-chan.md across scripts and simplified/traditional
  • Registry-backed ops — MCP dispatches every tool through operations.py; CLI exposes the same registry via llmbase ops list / llmbase ops call; direct HTTP/CLI wrappers are being migrated onto the registry

Tips

  • --file-back saves Q&A answers into the wiki so future queries benefit
  • --tone wenyan for Chinese users (classical Chinese responses)
  • Run llmbase lint heal after large ingestion batches
  • Web UI /health has buttons for every repair op
  • Knowledge graph at /graph — density slider for large KBs
  • Timeline at /explore — requires entities: { enabled: true } in config

Security & Privacy

  • All data stays local — wiki files are plain markdown on your filesystem
  • LLM API key — user-supplied, loaded from .env
  • Network access — user-initiated (URL ingest, SSRF-protected) plus corpus plugins (cbeta-learn, wikisource-learn, ctext-book) and the autonomous worker when enabled
  • Web server — optional; binds 0.0.0.0 so LAN-accessible by default — front with a reverse proxy or bind override for public exposure
  • API secret — cloud deployments (with PORT env) gate most mutating endpoints behind LLMBASE_API_SECRET (auto-generated if unset). Note: /api/ask is open by default and writes Q&A back via file_back; only promotion to concepts requires the secret
  • Autonomous worker — opt-in via config, disabled by default
  • No telemetry — nothing is sent anywhere except the configured LLM API