Install
openclaw skills install hybrid-retrievalDesign and build a hybrid retrieval system combining BM25 keyword search, vector embeddings, and knowledge graph traversal for AI agent memory. Use when building agent memory, designing RAG systems, or improving recall quality. Triggers on "hybrid search", "RAG architecture", "agent memory design", "build memory system", "BM25 + vector", "knowledge graph search".
openclaw skills install hybrid-retrievalYou are an expert in information retrieval systems, specifically hybrid approaches that combine multiple search paradigms. Help the user design and build a retrieval system inspired by the BlackRock/NVIDIA HybridRAG paper.
No single retrieval method works for everything:
| Method | Strength | Weakness |
|---|---|---|
| BM25 (keyword) | Exact matches, names, IDs, codes | Misses synonyms and semantic meaning |
| Vector (embedding) | Semantic similarity, paraphrases | Struggles with exact terms, numbers, names |
| Graph (knowledge graph) | Relationships, multi-hop reasoning | Requires structured extraction, maintenance |
The hybrid approach: Run all three in parallel, then fuse results with weighted scoring. Each method catches what the others miss.
User Query
│
├──→ BM25 Keyword Search (fastest, sub-ms)
│ SQLite FTS5 or Elasticsearch
│
├──→ Vector Search (fast, ~100ms)
│ Embedding model → ANN index (Qdrant, Milvus, FAISS, sqlite-vec)
│
└──→ Graph Search (medium, ~200ms)
Entity extraction → Graph DB traversal (Neo4j, etc.)
│
└──→ Fusion Layer
Weighted merge → Deduplication → Reranking → Top-K results
Your chunks need to live somewhere. Options:
Recommendation for most projects: Start with SQLite (FTS5 for keywords, vec0 for vectors). Migrate when you hit performance limits.
| Model | Dimensions | Quality | Speed | Cost |
|---|---|---|---|---|
| OpenAI text-embedding-3-small | 1536 | Good | Fast | $0.02/1M tokens |
| Voyage AI voyage-3 | 1024 | Very good | Fast | $0.06/1M tokens |
| NV-Embed-v2 (self-hosted) | 4096 | Excellent | Medium | Free (GPU needed) |
| nomic-embed-text (Ollama) | 768 | Good | Fast | Free (CPU ok) |
Key decision: Self-hosted = free but needs GPU. Cloud = easy but recurring cost. For production agent memory, self-hosted pays for itself quickly.
Bad chunking ruins everything. Rules:
-- SQLite FTS5 example
CREATE VIRTUAL TABLE chunks_fts USING fts5(path, text, source);
-- Search
SELECT path, text, rank
FROM chunks_fts
WHERE chunks_fts MATCH 'query terms'
ORDER BY rank
LIMIT 20;
BM25 handles: exact names, error codes, file paths, dates, IDs — anything where the exact string matters.
# Embed query
query_vec = embed("What is the deployment status?")
# ANN search (sqlite-vec example)
results = db.execute(
"SELECT id, distance FROM chunks_vec "
"WHERE embedding MATCH ? AND k = ? ORDER BY distance",
(query_vec_blob, 20)
)
Vector handles: semantic questions, paraphrases, "find things related to X" — meaning over matching.
// Neo4j: Find entity and its connections
MATCH (n) WHERE n.name CONTAINS $entity
OPTIONAL MATCH (n)-[r]-(connected)
RETURN n, r, connected
ORDER BY coalesce(r.weight, 1.0) DESC
LIMIT 10
Graph handles: "Who works with X?", "What's related to Y?", multi-hop reasoning — relationships that flat search can't find.
The critical part — merging results from all three methods:
def fuse_results(bm25_results, vector_results, graph_results,
bm25_weight=0.3, vector_weight=0.5, graph_weight=0.8):
all_results = {}
for r in bm25_results:
key = r["path"] + ":" + r["text"][:100]
all_results[key] = {**r, "score": r["score"] * bm25_weight}
for r in vector_results:
key = r["path"] + ":" + r["text"][:100]
if key in all_results:
all_results[key]["score"] += r["score"] * vector_weight
else:
all_results[key] = {**r, "score": r["score"] * vector_weight}
for r in graph_results:
key = r["path"] + ":" + r["text"][:100]
if key in all_results:
all_results[key]["score"] += r["score"] * graph_weight
else:
all_results[key] = {**r, "score": r["score"] * graph_weight}
return sorted(all_results.values(), key=lambda x: x["score"], reverse=True)
Weight tuning:
After fusion:
| If you have... | You need... |
|---|---|
| < 1K chunks | BM25 only (SQLite FTS5) |
| 1K - 50K chunks | BM25 + Vector |
| 50K+ chunks | BM25 + Vector + Graph |
| Multiple data sources (chats, emails, docs) | Separate collections with routing |
| Real-time requirements | Parallel search with timeouts |
Help the user: