Corpus Search

v1.0.1

语料检索工具,与 corpus-builder 配合使用。支持语义搜索、元数据过滤(场景/情绪/节奏/质量)。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。

0· 61·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (语料检索,与 corpus-builder 配合) matches the files and behavior: it opens a ChromaDB persistent client in the corpus-builder corpus path, computes embeddings via sentence-transformers, and supports metadata filters. The storage path in default_config.yml explicitly points to the corpus-builder corpus directory, which is expected for this purpose.
Instruction Scope
SKILL.md only instructs running the included Python script and editing the config to point to the corpus. The script operates on the configured local persist_directory and does not reference unrelated system paths or require environment secrets. Note: loading the specified embedding model (SentenceTransformer with model name 'BAAI/bge-small-zh-v1.5') will typically download model weights from the model host (internet access) unless already cached.
Install Mechanism
There is no install hook; dependencies are declared in requirements.txt (pip). Those packages are plausible for the task (chromadb, sentence-transformers, pyyaml, rich, tqdm). No archives or external install URLs are used. The only minor mismatch: requirements.txt lists diskcache but the code currently uses only an in-memory cache (comment indicates diskcache was removed).
Credentials
The skill requests no environment variables or credentials and does not require unrelated secrets. The only notable external access is model download via sentence-transformers/HuggingFace (public model name provided) which does not require credentials for a public model; if a private model were used the user would need to provide HF credentials separately (not requested by this skill).
Persistence & Privilege
always is false and the skill is user-invocable. It does not modify other skills' configs or require persistent system-wide privileges. It reads from a local corpus directory (expected).
Assessment
This skill appears to do what it says: local semantic search over a ChromaDB corpus produced by corpus-builder. Before installing or running: 1) ensure the configured persist_directory points to the corpus you expect (inspect configs/default_config.yml); 2) be aware model loading (sentence-transformers) may download large weights from the internet — run in an environment with sufficient disk space and network policy you control; 3) verify the corpus directory contains only data you're willing to let the skill read (it will access files under the corpus-builder path); 4) optionally run the script in a sandbox or inspect the full script if you want to confirm behavior. The minor issues: requirements.txt includes diskcache although the code currently uses in-memory caching — harmless but worth noting.

Like a lobster shell, security has layers — review code before you run it.

latestvk97e9htqj0bznrjcbfxptfty1h841da9

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Comments