Audit Case Rag

v0.1.0

Local-first, event-driven RAG for commercial real estate audit & investigation case folders. Index a case directory named like "项目问题编号__标题" (with stage subfolders such as 01_policy_basis/02_process/04_settlement_payment) and query it with citations (file:// links + PDF

0· 1.2k·2 current·2 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description match the implementation: the Python script discovers documents under a case folder, infers case_id/stage, converts Office files to PDF (via LibreOffice), extracts page text with pypdf, chunks text, builds TF‑IDF and embedding indices, writes manifest.jsonl and a .joblib index, and supports querying with stage filters and file://#page citations.
Instruction Scope
SKILL.md instructs only local actions (indexing, local conversion, local querying). The runtime instructions and code operate over files in the provided case directory and write local outputs (manifest, converted PDFs, .joblib). Note: manifest contains file paths and the index stores text chunks — these are sensitive outputs that the skill explicitly warns not to commit.
Install Mechanism
No install spec in registry — installation is via pip install -r scripts/requirements.txt (documented in SKILL.md). Requirements come from PyPI (fastembed, scikit-learn, pypdf, etc.). This is a standard approach but carries the usual pip risks: packages may execute arbitrary code at install time or may pull model weights or network resources at runtime. There are no direct downloads or obscure URLs in the repo, which reduces high-risk red flags.
Credentials
The skill requests no environment variables, no external credentials, and no config paths. The code does not reference secrets or environment tokens. This is proportionate to the stated purpose.
Persistence & Privilege
The skill is not always-enabled and does not attempt to modify system-wide agent settings or other skills. It creates local outputs (converted PDFs, manifest, joblib index) within the provided out_dir/case directory only.
Assessment
This skill appears to do what it says: local-only indexing and querying of case folders. Before installing and running it: (1) use a Python virtual environment as instructed; (2) inspect and vet the 'fastembed' package (and other requirements) — some embedding libraries download models or contact remote endpoints at runtime; (3) monitor network activity during pip install and first runs if you want to ensure no unexpected downloads or outbound connections occur; (4) run on a machine or container where writing converted PDFs and indices is acceptable, and do not commit manifest/index files to version control; (5) ensure LibreOffice ('soffice') is installed if you need Office→PDF conversion; (6) if you need higher assurance, review the full untruncated script (only part was shown) to confirm there are no hidden network calls or telemetry.

Like a lobster shell, security has layers — review code before you run it.

latestvk97cgkbpbx5cj079ej3qe8yj6980ngfa

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments