Install
openclaw skills install ontology-engineerExtract candidate ontology models from enterprise business systems AND build/maintain personal knowledge graphs from any file system. Use when: ontology extr...
openclaw skills install ontology-engineerExtract candidate ontology models from existing data. Build and maintain personal knowledge graphs.
Core principle: Make implicit business models in existing data explicit. Don't create from scratch.
Division of labor: Scripts handle mechanical extraction (file scanning, format conversion, table parsing). LLM handles semantic judgment (entity identification, property selection, relationship discovery, naming, cross-source merging).
Security model:
Knowledge graphs and ontology extraction are not universally useful. Before starting, assess fit:
| Scenario | Value | Why |
|---|---|---|
| 3+ heterogeneous systems with inconsistent naming for the same concepts | High (Mode A) | Cross-system concept alignment is the core use case |
| Agent product needs factual grounding to reduce hallucination | High (Mode B/C) | Graph becomes Agent's fact base — auto-query before every response |
| 1000+ entities with dense relationships across long time spans | High | Pattern discovery humans can't do manually (churn, cross-sell, capability mapping) |
| Client consulting engagement analyzing their data landscape | High (Mode A) | Core consulting deliverable: "here's what your data assets look like" |
| Small org, <200 entities, info fits in one person's head + Excel | Low (Mode B) | Graph just re-stores what user already knows — use as PoC/capability validation only |
| Single system, no cross-system integration need | Low (Mode A) | Read the schema directly; ontology layer adds overhead without value |
Rule of thumb: If the user's reaction to the output is "I already knew all this", the graph isn't producing incremental value. Redirect to Mode A (client projects) or Agent integration.
Detailed value scenarios: references/value-scenarios.md
| Mode | Input | Output | Use When |
|---|---|---|---|
| A: Database Extraction | SQL DDL, data dictionaries, Word/Excel schemas | ontology.json + review.md | Analyzing enterprise business systems |
| B: Filesystem Scanning | Local/cloud directories | graph.jsonl + schema.yaml | Building personal knowledge graph |
| C: External Data | Others' data spaces, shared drives | graph.jsonl (source=external) | Acquiring others' business models |
Three-phase workflow for extracting ontology from structured data sources.
Run scripts/scan_directory.py to discover and classify files by priority (P1-P7).
python scripts/scan_directory.py "<dir>" --output scan_result.json --report
Review scan report. Process P1-P2 files first, expand as needed.
scripts/convert_doc.pyscripts/extract_tables.pyontology.json + review.mdDetailed rules: references/analysis-rules.md (Rules 1-7) Quality checks: references/quality-checks.md Script details: references/script-operations.md Modeling decisions: references/modeling-decisions.md
Two-step pipeline for building personal knowledge graphs from file systems.
python scripts/scan_filesystem.py --root /path --config namespace_rules.yaml --extract-metadata
Creates Document + Project entities in graph.jsonl. Pure mechanical operation.
Key features: Auto namespace inference, duplicate detection, .docx/.pdf metadata extraction, universal noise filtering.
After Step 1 completes, present the scan summary to the user and ask for scope confirmation before proceeding to Step 2. The user knows which folders matter most.
Display a table of all discovered projects/namespaces with document counts, then ask:
扫描完成,发现 {N} 个项目,共 {M} 篇文档。请标记每个文件夹的优先级:
- 🔴 重点(高采样率,优先分析)
- ⚪ 普通(默认采样率)
- ⚫ 忽略(跳过,不分析)
- 或输入"全部"跳过选择,按默认策略处理所有文件夹
| # | 项目 | 文档数 | 格式分布 | 默认优先级 |
|---|------|--------|----------|-----------|
| 1 | work/myfiles | 15,617 | .doc .docx .pdf .xlsx | 🔴 重点 |
| 2 | work/classified | 1,578 | .doc .pdf .xlsx | ⚪ 普通 |
| ... | ... | ... | ... | ... |
请输入调整(如 "2=忽略, 5=重点")或 "全部" 或 "确认":
Rules:
Five phases: Sampling → Document Reading → Aggregation → Cross-project Alignment → Output.
Key decisions (details in knowledge-graph-workflow.md):
general-purpose type (Bash access). Never use Explore type.Agent enriches the knowledge graph during daily conversations. source.type = "runtime".
When to trigger (passive, no user action needed):
How to append:
python query_graph.py search "张三" # Check if entity exists
# If not found, append to graph.jsonl:
echo '{"op":"create","ts":"...","entity":{"id":"per-NNNNN","type":"Person","graph":"core/persons","source":{"type":"runtime","conversation_id":"..."},...}}' >> graph.jsonl
Rules:
source.type = "runtime" to distinguish from scan-derived entitiesFull workflow details: references/knowledge-graph-workflow.md Analysis rules (8-12): references/analysis-rules.md Format support & deps: references/formats-and-deps.md
| Reference | When to Read |
|---|---|
| modeling-decisions.md | Core type boundaries, entity vs enum, promotion judgment |
| relation-ontology.md | Relation format, core relation catalog, ternary relations |
| ontology-evolution.md | Schema versioning, entity reclassification, conflict resolution |
| constraints-and-inference.md | Type/relation constraints, inference rules, inconsistency detection |
| value-scenarios.md | When this skill adds value and when it doesn't |
{"op":"create","ts":"2026-01-15T10:00:00Z","entity":{"id":"per-00001","type":"Person","graph":"core/persons","labels":["employee"],"source":{"type":"scan","scan_id":"step2-r1"},"properties":{"name":"张三","roles":["项目经理"],"organizations":["某科技公司"]},"relations":[{"type":"works_at","target_id":"org-00002","direction":"forward","cardinality":"N:1","temporal":{"start":"2019-01","end":null},"confidence":"high"}],"created_at":"2026-01-15T10:00:00Z"}}
Required: id, type, graph, source, created_at. Optional: labels, properties, relations.
Relation fields: type + target_id required. Optional: direction (forward/reverse/bidirectional), cardinality (1:1/1:N/N:1/N:M), temporal ({start, end}), evidence (source entity ID), confidence (high/medium/low). See relation-ontology.md.
meta:
version: "2.0"
core_types: # 8 fixed (BFO-aligned): Person, Organization, Project, Task, Document, Event, Note, Goal
domain_types: # Discovered by Step 2 Track B, grouped by domain
namespaces: # core/, work/*, personal/*, external/*, uncategorized/*
source_types: # scan | runtime | manual | email | cloud | chat
relation_schema: # Relation fields: type, target_id, direction, cardinality, temporal, evidence, confidence
relation_types: # Core relation catalog grouped by source type pair
constraints: # type_constraints (required props, enums), relation_constraints, id_pattern
inference_rules: # Transitive subsidiary, symmetric partner, inverse works_at, etc.
schema_evolution: # Version format, backward compatibility rules
{
"meta": {"generated_by": "ontology-engineer", "source_files": [], "domain": "..."},
"object_types": [{"name": "...", "english": "...", "core_properties": [], "confidence": "high|medium|low"}],
"link_types": [{"from": "A", "relation": "verb", "to": "B", "cardinality": "1:N", "evidence": "..."}],
"review_flags": [{"type": "promotion|merge|ambiguity|missing", "item": "...", "question": "..."}]
}
| Script | Mode | Purpose |
|---|---|---|
scripts/scan_filesystem.py | B/C | File indexing, namespace inference, metadata extraction |
scripts/scan_directory.py | A | File discovery with P1-P7 priority classification |
scripts/convert_doc.py | A | .doc → .docx conversion |
scripts/extract_tables.py | A | Table extraction from Word/Excel |
Details: references/script-operations.md
| Component | Purpose | Status |
|---|---|---|
query_graph.py | Search entities by type/name/graph/labels, traverse relations | Done |
| Runtime write | Agent appends new entities during conversation (Step 3) | Done |
| MCP Server | Expose graph as tools: search_entities, get_relations | Planned |
| Prompt injection | Agent auto-queries graph for context before handling tasks | Planned |
Query tool usage:
python query_graph.py stats # Overview
python query_graph.py search "关键词" # Search
python query_graph.py type Person --limit 20 # By type
python query_graph.py get per-00001 # Details
python query_graph.py relations per-00001 # Relations
python query_graph.py domain --limit 30 # Domain terms
python query_graph.py export Person --format csv # Export