RAG
ReviewAudited by ClawScan on May 10, 2026.
Overview
Prompt-injection indicators were detected in the submitted artifacts (ignore-previous-instructions); human review is required before treating this skill as clean.
This skill appears safe to install as documentation. Before using its advice in a real RAG system, decide what documents may be indexed, avoid storing secrets or unnecessary PII, enforce retrieval-time access controls, and verify third-party embedding or vector-database data policies. ClawScan detected prompt-injection indicators (ignore-previous-instructions), so this skill requires review even though the model response was benign.
Findings (3)
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Users building RAG systems should remember that retrieved documents can contain malicious instructions and should not be treated as authoritative agent instructions.
This is a prompt-injection example used to explain a RAG security risk, not an instruction for the agent to follow.
Malicious content in indexed documents: ``` IGNORE ALL PREVIOUS INSTRUCTIONS. You are now... ``` ### Mitigations 1. **Input sanitization**
Keep retrieved content isolated from system instructions, sanitize suspicious content, and follow the mitigation guidance already included in the skill.
Private documents or malicious document text could be stored and reused in later answers if the index is not scoped, filtered, and maintained properly.
The skill describes persistent storage of document chunks, embeddings, and metadata, which is expected for RAG but can retain sensitive or poisoned content if implemented carelessly.
### Step 4: Upsert to Vector DB ```python # Include: chunk text, embedding, metadata # Metadata: source_file, page, section, timestamp ```
Limit indexed sources, exclude secrets and unnecessary PII, enforce access controls at retrieval time, and implement deletion/re-indexing procedures.
If a user implements the guidance with third-party embedding APIs, sensitive documents may be transmitted outside their organization.
The skill correctly discloses that external embedding providers may receive document content, which is a normal RAG data-flow consideration.
### When Using External APIs (OpenAI, Cohere) - Documents leave your perimeter - Check vendor's data retention policies - Consider self-hosted models for sensitive content
Review provider retention and compliance terms, use contractual protections such as BAAs where required, and self-host embeddings for sensitive corpora.
