RAG Pipeline Starter

PassAudited by ClawScan on Apr 18, 2026.

Overview

The skill's code, instructions, and requirements are coherent with a RAG pipeline toolkit: it performs local file analysis, chunking, embedding benchmarking (mocked), vector-index management, and retrieval tuning without requesting unrelated credentials or network installs.

What to consider before installing/running: - The package is instruction+code only and runs entirely on local files — there are no network calls or credential requests in the code, which reduces exfiltration risk. - The scripts create and modify files under the directories you pass as --output, --index, or --chunks. Run them in a controlled workspace or sandbox if you are testing, and avoid pointing them at sensitive system directories. - The embedding benchmark is mostly a mock/demo implementation. There is a small bug (function name mismatch: compute_similarity__mock vs. compute_similarity_mock) that may cause runtime errors; expect to edit/fix code if you want production use. The recommend logic also uses the first analyzed document to pick a strategy rather than aggregating across all documents — review if you need different behavior. - If you plan to plug in real (paid) embedding providers, you will need to manage API keys yourself; this skill does not request or manage credentials. Keep keys out of plain text and use secure storage. - Best practice: inspect the files locally (you already have them), run on a small sample dataset first, and run under a restricted environment (container or VM) if you are unsure. Given the available materials, the skill appears internally consistent and implements the features it claims; no indicators of data exfiltration or unrelated privileges were found.