kb-framework
v1.1.0Erstellt eine hybride Knowledge Base mit automatischer Markdown-, PDF- und OCR-Indexierung, SQLite- und ChromaDB-Integration plus tägliche Datenqualitätsprüf...
Security Scan
Capability signals
These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.
OpenClaw
Suspicious
medium confidencePurpose & Capability
The name/description (hybrid KB, Markdown/PDF/OCR, SQLite + ChromaDB, Obsidian integration) matches the included code: indexer, Chroma integration, hybrid search, and obsidian modules are present. However the registry metadata claims 'instruction-only / no install spec' while the bundle contains 56 Python files and shell scripts — so it is not purely instruction-only. Also SKILL.md and multiple docs reference environment variables (KB_DB_PATH, KB_BASE_PATH etc.) even though the skill declares no required env vars in metadata.
Instruction Scope
Runtime instructions ask you to copy the skill into the agent/workspace, pip install -r requirements.txt, and run the indexer --init (which will execute code). The SKILL.md suggests editing kb/config.py but changelog says kb/config.py was removed (incoherence). The codebase includes an Obsidian writer with create/update/delete/move operations — so the runtime behavior includes writing and deleting user files if used. The docs claim 'offline-only' and 'no network operations' but there is an 'update.py' / auto-updater referenced in docs and README examples include a git clone; you should inspect update.py and other scripts before running.
Install Mechanism
There is no formal install spec in the registry, but SKILL.md instructs pip install -r requirements.txt (requirements file exists in the bundle). pip will pull packages from PyPI (normal for Python projects) — moderate risk. No external arbitrary downloads or obscure URLs are present in the provided instructions, but the bundle includes shell scripts (kb.sh, scripts/install.sh) that will be placed/run on your system when you follow the instructions.
Credentials
Registry metadata lists no required env vars, but the code and documentation rely on many environment variables (KB_DB_PATH, KB_CHROMA_PATH, KB_LIBRARY_PATH, KB_BASE_PATH, KB_HOME). SECURITY_FUNCTIONS.txt and other docs list KB_HOME and other env vars. The skill also asks (implicitly) for filesystem access to user content (library and Obsidian vault) and will perform write/delete operations — this is proportional to a vault-syncing KB but is a sensitive capability and should be explicitly acknowledged by the user prior to install.
Persistence & Privilege
The skill is not marked always:true, and model invocation is allowed (normal). However the code includes a writer capable of creating/updating/deleting/moving user .md files in an Obsidian vault, a ghost/delete-orphans command that can remove DB entries, and an update.py/autoupdater is referenced in docs. Those features give the skill the ability to modify user data and potentially auto-update its code — combine that with the earlier metadata/instruction inconsistencies and you should treat installation as privileged and review how writer path-validation and the updater function before allowing it to run.
What to consider before installing
What to check before installing:
- Treat this as a code-bearing package (not just an instruction-only skill). Review the bundle's Python files (especially kb/obsidian/writer.py, update.py, kb/indexer.py and scripts/*.sh). Search for any network calls (requests, urllib, socket, subprocess calling curl/wget/git), telemetry, or hard-coded endpoints.
- Inspect requirements.txt to see which PyPI packages will be installed; consider installing inside a virtualenv or container.
- Backup any Obsidian vault or library directory you plan to point the skill at. Test writer operations in a disposable copy first because the writer can create/modify/delete notes.
- Verify path validation: confirm the writer strictly restricts operations to the configured vault directory and that delete operations move files to a trash/backup rather than immediate rm.
- Review update.py/autoupdater and any code that pulls updates or executes remote code before enabling automatic updates.
- If you want minimal risk, run the skill in a sandboxed environment (container, VM) and restrict its KB_LIBRARY_PATH to a directory containing only data you are willing to expose/modify.
Given the mismatches (metadata vs. included files, SKILL.md vs. changelog, environment variables not declared) and the presence of file-modifying + updater code, proceed only after manual code review or in an isolated/test environment.Like a lobster shell, security has layers — review code before you run it.
lateststable
KB Framework - OpenClaw Skill
Version: 2.0
Category: Knowledge Base / Search
Requires: Python 3.9+, SQLite, ChromaDB
What is the KB Framework?
A complete Knowledge Base with:
- Hybrid Search (semantic + keyword)
- Automatic Indexing (Markdown, PDF, OCR)
- SQLite + ChromaDB Integration
- Daily Audits for data quality
Installation (1 Minute)
1. Install the Skill
# Clone or extract into your OpenClaw workspace
cp -r kb-framework ~/.openclaw/workspace/
# Or just the skill:
cp kb-framework/SKILL.md ~/.npm-global/lib/node_modules/openclaw/skills/kb/
2. Install Dependencies
pip install -r requirements.txt
3. Initialize Database
python3 ~/.openclaw/workspace/kb-framework/kb/indexer.py --init
Configuration
Set environment variable KB_DB_PATH or edit kb/config.py
Usage
Python API
# Import
import sys
sys.path.insert(0, "/path/to/kb-framework")
from kb.indexer import BiblioIndexer
# Index a file
with BiblioIndexer("/path/to/knowledge.db") as idx:
idx.index_file("/path/to/file.md")
# Search
from kb.library.knowledge_base.hybrid_search import HybridSearch
hs = HybridSearch()
results = hs.search("Your search term", limit=10)
CLI (Recommended)
The built-in kb command provides easy access:
# Add to .bashrc for global access:
alias kb="/path/to/kb-framework/kb.sh"
# Commands:
kb index /path/to/file.md # Index a file
kb search "machine learning" # Search knowledge base
kb audit # Run full audit
kb ghost # Find orphaned entries
kb warmup # Preload ChromaDB model
Legacy Python Scripts
# Index a new file
python3 kb/indexer.py /path/to/file.md
# Ghost Scanner (finds orphaned DB entries)
python3 kb/scripts/kb_ghost_scanner.py
# Full Audit
python3 kb/scripts/kb_full_audit.py
# ChromaDB Warmup (at boot)
python3 kb/scripts/kb_warmup.py
Architecture
kb-framework/
├── SKILL.md # This file
├── README.md # Detailed documentation
├── kb/
│ ├── indexer.py # Core Indexer (BiblioIndexer)
│ ├── commands/ # CLI Commands: index, sync, audit, ghost, warmup, search
│ ├── base/ # Core: config.py, db.py, logger.py, command.py
│ ├── library/
│ │ └── knowledge_base/
│ │ ├── hybrid_search.py # Hybrid Search (semantic + keyword)
│ │ ├── chroma_integration.py # ChromaDB Wrapper
│ │ ├── chroma_plugin.py # ChromaDB Plugin (Collection Management)
│ │ ├── embedding_pipeline.py # Batch Embeddings
│ │ ├── reranker.py # Search Result Reranker
│ │ ├── fts5_setup.py # SQLite FTS5 Full-Text Search
│ │ ├── chunker.py # Text Chunking
│ │ └── synonyms.py # Query Expansion
│ └── obsidian/ # Obsidian Vault Integration
└── scripts/
├── index_pdfs.py # PDF + OCR Indexing
├── kb_ghost_scanner.py # Legacy ghost scanner
├── kb_full_audit.py # Legacy audit script
└── kb_warmup.py # Legacy warmup script
Database Schema
files Table
| Field | Type | Description |
|---|---|---|
| id | TEXT | UUID |
| file_path | TEXT | Absolute path |
| file_name | TEXT | Filename |
| file_category | TEXT | Category |
| file_type | TEXT | pdf/md/txt |
| file_size | INTEGER | Bytes |
| line_count | INTEGER | Lines |
| file_hash | TEXT | SHA256 |
| last_indexed | TIMESTAMP | Last indexing |
| index_status | TEXT | indexed/pending/failed |
| source_path | TEXT | Original path |
| indexed_path | TEXT | MD extract path |
| is_indexed | INTEGER | 0/1 |
file_sections Table
| Field | Type | Description |
|---|---|---|
| id | TEXT | UUID |
| file_id | TEXT | FK → files |
| section_header | TEXT | Heading |
| section_level | INTEGER | 1-6 |
| content_preview | TEXT | First 500 characters |
| content_full | TEXT | Full content |
| keywords | TEXT | JSON Array |
| importance_score | REAL | 0.0-1.0 |
keywords Table
| Field | Type | Description |
|---|---|---|
| id | INTEGER | AUTOINCREMENT |
| keyword | TEXT | Word |
| weight | REAL | Frequency |
Troubleshooting
"ChromaDB slow on first start"
python3 kb/scripts/kb_warmup.py
"Search finds nothing"
# Run audit
python3 kb/scripts/kb_full_audit.py
# Ghost Scanner
python3 kb/scripts/kb_ghost_scanner.py
"OCR too slow"
# Enable GPU in index_pdfs.py:
GPU_ENABLED = True # Default: False
Library Structure (IMPORTANT)
content/ - Raw Files
All non-Markdown files:
library/content/
├── Gesundheit/ # PDFs, Studies
├── Medizin_Studien/ # Medical Literature
├── Bücher/ # Books, Guides
├── Sonstiges/ # Uncategorized
└── [category]/ # Custom categories possible
agent/ - Markdown Files
All .md files for agents:
library/agent/
├── projektplanung/ # Agent plans
├── memory/ # Daily logs
├── Workflow_Referenzen/ # Reusable workflows
├── agents/ # Agent-specific docs
└── [category]/ # Custom categories possible
Integrating New Files
Rule: library/[content|agent]/[category]/[topic]/[file]
Examples:
# New health PDF
library/content/Gesundheit/2026/Chelat-Therapie.pdf
# New agent plan
library/agent/projektplanung/Treechat_Upgrade.md
# New learning
library/agent/learnings/2026-04-12_Git_Workflow.md
License
MIT License - free to use.
Comments
Loading comments...
