Audit Case Rag

Local-first, event-driven RAG for commercial real estate audit & investigation case folders. Index a case directory named like "项目问题编号__标题" (with stage subfolders such as 01_policy_basis/02_process/04_settlement_payment) and query it with citations (file:// links + PDF

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 1.1k · 2 current installs · 2 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name/description match the implementation: the Python script discovers documents under a case folder, infers case_id/stage, converts Office files to PDF (via LibreOffice), extracts page text with pypdf, chunks text, builds TF‑IDF and embedding indices, writes manifest.jsonl and a .joblib index, and supports querying with stage filters and file://#page citations.
Instruction Scope
SKILL.md instructs only local actions (indexing, local conversion, local querying). The runtime instructions and code operate over files in the provided case directory and write local outputs (manifest, converted PDFs, .joblib). Note: manifest contains file paths and the index stores text chunks — these are sensitive outputs that the skill explicitly warns not to commit.
Install Mechanism
No install spec in registry — installation is via pip install -r scripts/requirements.txt (documented in SKILL.md). Requirements come from PyPI (fastembed, scikit-learn, pypdf, etc.). This is a standard approach but carries the usual pip risks: packages may execute arbitrary code at install time or may pull model weights or network resources at runtime. There are no direct downloads or obscure URLs in the repo, which reduces high-risk red flags.
Credentials
The skill requests no environment variables, no external credentials, and no config paths. The code does not reference secrets or environment tokens. This is proportionate to the stated purpose.
Persistence & Privilege
The skill is not always-enabled and does not attempt to modify system-wide agent settings or other skills. It creates local outputs (converted PDFs, manifest, joblib index) within the provided out_dir/case directory only.
Assessment
This skill appears to do what it says: local-only indexing and querying of case folders. Before installing and running it: (1) use a Python virtual environment as instructed; (2) inspect and vet the 'fastembed' package (and other requirements) — some embedding libraries download models or contact remote endpoints at runtime; (3) monitor network activity during pip install and first runs if you want to ensure no unexpected downloads or outbound connections occur; (4) run on a machine or container where writing converted PDFs and indices is acceptable, and do not commit manifest/index files to version control; (5) ensure LibreOffice ('soffice') is installed if you need Office→PDF conversion; (6) if you need higher assurance, review the full untruncated script (only part was shown) to confirm there are no hidden network calls or telemetry.

Like a lobster shell, security has layers — review code before you run it.

Current versionv0.1.0
Download zip
latestvk97cgkbpbx5cj079ej3qe8yj6980ngfa

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

audit-case-rag

This skill packages a local-only workflow to build a searchable evidence index for a single audit/investigation case and query it with page-level citations.

Workflow

0) Prepare a case folder (事件驱动)

Create a case directory named:

  • <项目问题编号>__<标题>

Inside, use stage folders (stage is inferred from folder name):

  • 01_policy_basis/ (basis) — 制度/流程/授权
  • 02_process/ (process) — 招采/定标/过程证据
  • 03_contract/ (contract) — 合同/补充协议
  • 04_settlement_payment/ (payment) — 结算/付款/发票/验收
  • 05_comm/ (comm) — 邮件/会议纪要/IM
  • 06_interviews/ (interview) — 访谈/笔录/询证
  • 07_workpapers/ (workpaper) — 底稿/抽样/复核表
  • 09_rectification/ (rectification) — 整改/闭环

Full template: references/case-folder-template.md

1) Install dependencies (local)

From the skill folder (or copy the script into your repo):

python3 -m venv .venv
source .venv/bin/activate
pip install -r scripts/requirements.txt

LibreOffice is recommended for Office→PDF page citations:

  • soffice must be available (PATH) or pass --soffice /path/to/soffice.

2) Index the case

./scripts/audit_case_rag.py index \
  --case-dir "/path/to/<项目问题编号>__<标题>" \
  --out-dir  "/path/to/audit_rag_db"

Outputs:

  • manifest.jsonl written into the case directory
  • audit_rag_db/<case_id>.joblib (persistent local index)

3) Query with event filters

./scripts/audit_case_rag.py query \
  --case "<项目问题编号>" \
  --stage payment \
  "付款节点是否倒挂?请给出处页码"

Notes:

  • Evidence lines include clickable file://...#page=N citations when possible.
  • Retrieval is hybrid: embedding recall + TF‑IDF rerank (alpha configurable).

Safety/Privacy

  • No cloud APIs. Everything runs locally.
  • Do not commit outputs (indices, converted PDFs) to git.

Files

4 total
Select a file
Select a file to preview.

Comments

Loading comments…