RAG

Build, optimize, and debug RAG pipelines with chunking strategies, retrieval tuning, evaluation metrics, and production monitoring.

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 2 · 1.2k · 9 current installs · 9 all-time installs

byIván@ivangdavila

MIT-0

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

The name/description (RAG pipelines, chunking, retrieval, evaluation, monitoring) match the included documents (architecture.md, implementation.md, evaluation.md, security.md). No unrelated credentials, binaries, or installs are requested.

✓

Instruction Scope

SKILL.md and the companion files confine themselves to building and operating RAG systems (ingest, chunk, embed, store, retrieve, monitor). They do not instruct reading arbitrary host files, accessing unrelated env vars, or sending data to unknown endpoints. A detected prompt-injection pattern appears in the docs as an example of attack content and mitigation, not as an instruction to ignore supervisor prompts; implementers should still apply prompt-isolation and sanitization at runtime.

✓

Install Mechanism

No install spec or code is present; the skill is instruction-only so nothing will be downloaded or written by the skill itself.

✓

Credentials

The skill declares no required environment variables, credentials, or config paths. All recommended integrations (embedding APIs, vector DBs) are optional and appropriately described in the docs.

✓

Persistence & Privilege

always:false and no special privileges are requested. The skill does not request permanent presence, system-wide config changes, or access to other skills' credentials. Autonomous invocation is enabled by default on the platform but this skill does not widen its blast radius by requesting extra privileges.

Scan Findings in Context

[ignore-previous-instructions] expected: The docs contain an example of malicious prompt injection (e.g., 'IGNORE ALL PREVIOUS INSTRUCTIONS...') inside security.md and SKILL.md as part of the threat discussion. This is an expected educational example, not an instruction for the skill to ignore supervisor prompts.

Assessment

This skill is an offline documentation pack for building RAG systems and is internally coherent. Before installing/using: (1) enforce prompt isolation and input sanitization at runtime to mitigate prompt-injection risks documented here; (2) follow the security.md guidance when you connect to external embedding/vector APIs — avoid sending sensitive PII/PHI to third-party APIs unless you have the proper agreements and controls; (3) test any ingestion code in a staging environment to confirm metadata-based access control (filters/namespaces) works as expected; (4) because the skill is instruction-only, it cannot itself exfiltrate data, but any implementation you build following these instructions can — review network/credential handling in your runtime. If you want lower risk, use the docs as a read-only reference rather than enabling autonomous agent invocation of the skill.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0

Download zip

latestvk97b7z3k39fh3xzr0z3xfm9rf18122q0

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

SKILL.md

When to Use

User wants to implement, improve, or troubleshoot Retrieval-Augmented Generation systems.

Quick Reference

Topic	File
Pipeline components & architecture	`architecture.md`
Implementation patterns & code	`implementation.md`
Evaluation metrics & debugging	`evaluation.md`
Security & compliance	`security.md`

Core Capabilities

Architecture design — Select embedding models, vector DBs, and chunking strategies based on requirements
Implementation — Write ingestion pipelines, query handlers, and update logic
Retrieval optimization — Tune top-k, reranking, hybrid search parameters
Evaluation — Build test datasets, measure recall/precision, diagnose failures
Production ops — Monitor quality drift, set up alerts, debug degradation
Security — PII detection, access control, compliance requirements

Decision Checklist

Before recommending architecture, ask:

What document types and volume?
Latency requirements (real-time chat vs batch)?
Update frequency (how often do docs change)?
Access control needs (who can see what)?
Compliance constraints (GDPR, HIPAA, SOC2)?
Budget (managed vs self-hosted, embedding costs)?

Critical Rules

Never skip access control — Filter at retrieval time, not after
Always overlap chunks — 10-20% prevents context loss at boundaries
Evaluate before optimizing — Build eval dataset first, then tune
Same embedding model — Query and documents must use identical model
Monitor similarity scores — Dropping averages signal drift or issues
Plan for deletion — GDPR erasure requires re-embedding capability

Common Failure Patterns

Symptom	Likely Cause	Fix
Wrong docs retrieved	Query too vague, poor chunks	Query expansion, smaller chunks
Relevant doc missed	Not indexed, low similarity	Check ingestion, hybrid search
Hallucinated answers	Context too short	Increase top-k, better reranking
Slow responses	Large chunks, no caching	Optimize chunk size, cache embeddings
Inconsistent results	Non-deterministic reranking	Set seeds, use stable sorting

Files

5 total

Select a file

Select a file to preview.

Comments

Loading comments…