LookupMark Local RAG
v1.1.0Semantic search over local files using BGE-M3 embeddings and BGE-RERANKER-LARGE reranking with ChromaDB and parent-child chunking. Use when the user asks que...
MIT-0
Security Scan
OpenClaw
Suspicious
medium confidencePurpose & Capability
Name/description match the code: scripts index and query local files into a ChromaDB store using BGE embeddings and a reranker. Storage and default paths (~/.local/share/local-rag and ~/Documenti, ~/Scaricati) and parent/child chunking are consistent. However, the package does not declare runtime dependencies (chromadb, sentence_transformers, FlagEmbedding) or an install spec despite the code requiring them — this is an incoherence (missing declared dependencies).
Instruction Scope
SKILL.md instructs running the included Python scripts which will crawl and read many files under user home directories (expected for this skill) and write a local DB and venv. The code also performs runtime operations beyond simple file reads: it attempts to install missing Python packages at runtime via subprocess.check_call and loads remote model code via SentenceTransformer(..., trust_remote_code=True). Those behaviors expand scope (network fetches, dynamic code execution) and increase risk. The scripts do not transmit files or data to external endpoints in their code, but they will download models and may execute model-provided code.
Install Mechanism
There is no install spec. The code performs dynamic installs at runtime using subprocess.check_call for pdfplumber and python-docx, but the command lists include an unexpected 'uv' token (['uv', 'pip', 'install', '-p', sys.executable, 'pdfplumber']) which looks like a typo or an attempt to call an unexpected binary. The scripts also rely on SentenceTransformer to download models (network activity) and use trust_remote_code=True, which can execute model-provided code. No URL downloads or archive extraction are present, but runtime package installation and remote model code are notable risks.
Credentials
The skill requests no environment variables or explicit credentials (good). However, the advertised models (BAAI/bge-m3 and BAAI/bge-reranker-large) may require authentication tokens on some systems (e.g., HF tokens) which are not declared; the code does not request them explicitly and will fail or require the user to provide them in their environment. Also, the code reads arbitrary user files in the default paths — this is expected for a local search tool but remains sensitive and should be acknowledged by the user.
Persistence & Privilege
The skill writes its database and venv under ~/.local/share/local-rag and creates an index.lock file; it does not request elevated privileges or the 'always' flag. It does not modify other skills or system-wide agent configuration. Its persistence is limited to its own directory.
What to consider before installing
This skill mostly implements a local document semantic-search tool, but there are a few red flags to consider before installing: 1) The scripts will crawl and index files under your home directories (~/Documenti and ~/Scaricati by default) and store them at ~/.local/share/local-rag — only use it if you are comfortable with that. 2) The code performs runtime network activity: it downloads models via sentence_transformers (trust_remote_code=True), which can execute code supplied with remote models — this is a real execution risk. 3) The query script imports FlagEmbedding.FlagReranker (not included), and the project lacks a declared install step for required packages (chromadb, sentence_transformers, FlagEmbedding, pdfplumber, python-docx, etc.), so dependency resolution is opaque. 4) The script attempts to run pip via subprocess, but the command includes an unusual 'uv' token which looks like a bug (or could invoke an unexpected binary if present). Recommended actions: (a) Inspect and run the code in a controlled sandbox or disposable VM first; (b) run the indexer on a small safe folder or a copy of sensitive files, not on your entire home directory; (c) if you must run it on real data, create an isolated Python virtualenv yourself and install dependencies explicitly (do not rely on the script's subprocess install lines); (d) avoid trusting models that require trust_remote_code or run them with caution — prefer vetted models or disable trust_remote_code if possible; (e) verify what 'FlagEmbedding' is and where it will be installed from; and (f) consider adding a declared install spec and explicit dependency list before trusting the skill. If you want, I can list the exact lines to change to make it safer (e.g., remove trust_remote_code, fix the pip-install command, and add explicit dependency install instructions).Like a lobster shell, security has layers — review code before you run it.
latest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Local RAG
Semantic search over indexed local files with parent-child chunking for precise retrieval with full context.
Architecture
- Embeddings: BAAI/bge-m3 (multilingual, 1024-dim)
- Reranker: BAAI/bge-reranker-large (cross-encoder)
- Vector DB: ChromaDB (persistent, cosine similarity, HNSW)
- Chunking: Parent-child — small children for search, large parents for context
- Storage:
~/.local/share/local-rag/chromadb/ - Venv:
~/.local/share/local-rag/venv/
Chunking Strategy
- Child chunks: 128 words, 24 overlap → embedded for semantic search
- Parent chunks: 768 words → stored as full context, returned to user
- When a child matches → its parent is returned, giving surrounding context
Running
All scripts must use the venv Python:
VENV=~/.local/share/local-rag/venv/bin/python
Indexing
$VENV skills/local-rag/scripts/index.py
Re-index from scratch:
$VENV skills/local-rag/scripts/index.py --reindex
Custom paths:
$VENV skills/local-rag/scripts/index.py --paths ~/Documenti ~/Progetti
Querying
$VENV skills/local-rag/scripts/query.py "what are the termination clauses?"
Options:
--top-k N— Child candidates from vector search (default: 20)--top-n N— Final parent results after reranking (default: 3)--json— JSON output
Supported Formats
Text: .txt, .md, .csv, .json, .yaml, .py, .js, .sh, etc.
Documents: .pdf (pdfplumber), .docx (python-docx)
Default Paths
~/Documenti and ~/Scaricati. Edit DEFAULT_PATHS in index.py to change.
Workflow
- Run
index.py— builds/rebuilds the index (incremental via SHA-256 hash check) - Run periodically to pick up new/changed files
- Use
query.pyto search with natural language - Results include: file path, relevance score, matched snippet, full parent context
Files
3 totalSelect a file
Select a file to preview.
Comments
Loading comments…
