Hybrid Retrieval (BM25 + Vector + Graph)

v1.0.0

Design and build a hybrid retrieval system combining BM25 keyword search, vector embeddings, and knowledge graph traversal for AI agent memory. Use when buil...

1· 93·0 current·0 all-time
bySARAI Defence@vnesin-sarai

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for vnesin-sarai/hybrid-retrieval.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Hybrid Retrieval (BM25 + Vector + Graph)" (vnesin-sarai/hybrid-retrieval) from ClawHub.
Skill page: https://clawhub.ai/vnesin-sarai/hybrid-retrieval
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install hybrid-retrieval

ClawHub CLI

Package manager switcher

npx clawhub@latest install hybrid-retrieval
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name and description match the SKILL.md content: it describes BM25, vector embeddings, and knowledge-graph search, plus fusion/reranking. All required pieces (SQLite/pgvector/Qdrant, embedding models, Neo4j) are appropriate for building the described hybrid retrieval system.
Instruction Scope
The SKILL.md gives implementation examples (SQL, Python, Cypher) and high-level operational guidance. It references chunking source data (files, paths, metadata) which is expected for a retrieval system, but it does not instruct the agent to read arbitrary system files or exfiltrate data. Note: examples reference using third-party embedding services and vector DBs; following those examples will require the user/agent to connect to external services (not done automatically by the SKILL.md itself).
Install Mechanism
This is an instruction-only skill with no install spec and no code files. That minimizes risk: nothing is downloaded or written to disk by the skill itself.
Credentials
The skill declares no required environment variables or credentials, yet recommends using external embedding APIs (OpenAI, Voyage AI) and external DBs (Qdrant, Neo4j). This is not inherently malicious, but it's an omission: users will need to supply API keys/DB credentials when implementing the architecture—those are not requested by the skill but will be needed in practice.
Persistence & Privilege
The skill does not request always:true, has no install actions, and does not modify agent/system configuration. It does not request permanent presence or escalate privileges.
Assessment
This skill is an instructional guide for building a hybrid retrieval system and appears internally coherent. Before using it: (1) be aware that implementing the examples will require you to provide credentials for embedding APIs and DBs—do not paste secrets into chat or shared contexts; (2) evaluate data privacy: indexing private documents and sending them to cloud embedding providers may expose sensitive data—consider self-hosted models if that matters; (3) expect resource costs (GPU, vector DB hosting, API costs) and plan accordingly; (4) if you let an agent invoke this skill autonomously, restrict what data the agent can access and which credentials it can use. If you want a deeper security review, provide any install scripts, connector code, or explicit calls the agent will run so those can be inspected for risky behavior.

Like a lobster shell, security has layers — review code before you run it.

latestvk975kdsjcc9b9tsp6z5153wvd184qd5t
93downloads
1stars
1versions
Updated 2w ago
v1.0.0
MIT-0

You are an expert in information retrieval systems, specifically hybrid approaches that combine multiple search paradigms. Help the user design and build a retrieval system inspired by the BlackRock/NVIDIA HybridRAG paper.

Core Insight

No single retrieval method works for everything:

MethodStrengthWeakness
BM25 (keyword)Exact matches, names, IDs, codesMisses synonyms and semantic meaning
Vector (embedding)Semantic similarity, paraphrasesStruggles with exact terms, numbers, names
Graph (knowledge graph)Relationships, multi-hop reasoningRequires structured extraction, maintenance

The hybrid approach: Run all three in parallel, then fuse results with weighted scoring. Each method catches what the others miss.

Architecture Pattern

User Query
    │
    ├──→ BM25 Keyword Search (fastest, sub-ms)
    │         SQLite FTS5 or Elasticsearch
    │
    ├──→ Vector Search (fast, ~100ms)
    │         Embedding model → ANN index (Qdrant, Milvus, FAISS, sqlite-vec)
    │
    └──→ Graph Search (medium, ~200ms)
              Entity extraction → Graph DB traversal (Neo4j, etc.)
    │
    └──→ Fusion Layer
              Weighted merge → Deduplication → Reranking → Top-K results

Step-by-Step Design

Step 1: Choose Your Document Store

Your chunks need to live somewhere. Options:

  • SQLite + FTS5 + vec0 — Single file, zero infrastructure, good up to ~100K chunks
  • PostgreSQL + pgvector — Production-ready, handles millions
  • Qdrant / Milvus — Purpose-built vector DBs, best for scale
  • Elasticsearch — If you already use it, it does BM25 + vector natively

Recommendation for most projects: Start with SQLite (FTS5 for keywords, vec0 for vectors). Migrate when you hit performance limits.

Step 2: Choose Your Embedding Model

ModelDimensionsQualitySpeedCost
OpenAI text-embedding-3-small1536GoodFast$0.02/1M tokens
Voyage AI voyage-31024Very goodFast$0.06/1M tokens
NV-Embed-v2 (self-hosted)4096ExcellentMediumFree (GPU needed)
nomic-embed-text (Ollama)768GoodFastFree (CPU ok)

Key decision: Self-hosted = free but needs GPU. Cloud = easy but recurring cost. For production agent memory, self-hosted pays for itself quickly.

Step 3: Chunking Strategy

Bad chunking ruins everything. Rules:

  1. Chunk by semantic unit — sections, paragraphs, conversations. NOT fixed-size windows.
  2. Include metadata — file path, date, source type. You'll filter on this later.
  3. Overlap sparingly — 10-20% overlap prevents losing context at boundaries.
  4. Keep chunks 200-600 tokens — too small = no context, too large = noise.

Step 4: BM25 Layer

-- SQLite FTS5 example
CREATE VIRTUAL TABLE chunks_fts USING fts5(path, text, source);

-- Search
SELECT path, text, rank
FROM chunks_fts
WHERE chunks_fts MATCH 'query terms'
ORDER BY rank
LIMIT 20;

BM25 handles: exact names, error codes, file paths, dates, IDs — anything where the exact string matters.

Step 5: Vector Layer

# Embed query
query_vec = embed("What is the deployment status?")

# ANN search (sqlite-vec example)
results = db.execute(
    "SELECT id, distance FROM chunks_vec "
    "WHERE embedding MATCH ? AND k = ? ORDER BY distance",
    (query_vec_blob, 20)
)

Vector handles: semantic questions, paraphrases, "find things related to X" — meaning over matching.

Step 6: Graph Layer (Optional but Powerful)

// Neo4j: Find entity and its connections
MATCH (n) WHERE n.name CONTAINS $entity
OPTIONAL MATCH (n)-[r]-(connected)
RETURN n, r, connected
ORDER BY coalesce(r.weight, 1.0) DESC
LIMIT 10

Graph handles: "Who works with X?", "What's related to Y?", multi-hop reasoning — relationships that flat search can't find.

Step 7: Fusion

The critical part — merging results from all three methods:

def fuse_results(bm25_results, vector_results, graph_results,
                 bm25_weight=0.3, vector_weight=0.5, graph_weight=0.8):
    all_results = {}

    for r in bm25_results:
        key = r["path"] + ":" + r["text"][:100]
        all_results[key] = {**r, "score": r["score"] * bm25_weight}

    for r in vector_results:
        key = r["path"] + ":" + r["text"][:100]
        if key in all_results:
            all_results[key]["score"] += r["score"] * vector_weight
        else:
            all_results[key] = {**r, "score": r["score"] * vector_weight}

    for r in graph_results:
        key = r["path"] + ":" + r["text"][:100]
        if key in all_results:
            all_results[key]["score"] += r["score"] * graph_weight
        else:
            all_results[key] = {**r, "score": r["score"] * graph_weight}

    return sorted(all_results.values(), key=lambda x: x["score"], reverse=True)

Weight tuning:

  • Graph results get highest weight — if the KG found a relevant entity, it's almost certainly right
  • Vector gets medium weight — good general recall
  • BM25 gets lowest weight — precise but narrow

Step 8: Deduplication and Reranking

After fusion:

  1. Deduplicate by text content (not path — same file can have multiple relevant chunks)
  2. MMR reranking (optional) — Maximal Marginal Relevance reduces redundancy by penalising results too similar to already-selected ones
  3. Score threshold — drop anything below 0.3 (tune this for your data)

Common Mistakes

  1. Using only vector search — Misses exact matches. "Port 8034" won't match semantically.
  2. Fixed-size chunking — Splitting mid-sentence destroys context.
  3. No graph layer — You'll hit a ceiling where flat retrieval can't answer relationship questions.
  4. Reranking with the same model — If you rerank with the same embeddings you searched with, you're just re-sorting the same biases.
  5. Ignoring BM25 — It's the fastest layer and catches what vectors miss. Always include it.

When to Add Complexity

If you have...You need...
< 1K chunksBM25 only (SQLite FTS5)
1K - 50K chunksBM25 + Vector
50K+ chunksBM25 + Vector + Graph
Multiple data sources (chats, emails, docs)Separate collections with routing
Real-time requirementsParallel search with timeouts

Output

Help the user:

  1. Assess their data volume and types
  2. Choose appropriate layers (BM25, vector, graph)
  3. Select embedding model and storage backend
  4. Design their chunking strategy
  5. Implement fusion with appropriate weights
  6. Set up a simple evaluation (test queries → expected results)

Further Reading

Comments

Loading comments...