Corpus Search

v1.0.1

语料检索工具,与 corpus-builder 配合使用。支持语义搜索、元数据过滤(场景/情绪/节奏/质量)。Use when: 需要搜索语料库中的小说片段、按场景类型过滤、查找特定情绪/节奏的描写、检索高质量写作素材。

0· 95·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for yuzhihui886/corpus-search.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Corpus Search" (yuzhihui886/corpus-search) from ClawHub.
Skill page: https://clawhub.ai/yuzhihui886/corpus-search
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install corpus-search

ClawHub CLI

Package manager switcher

npx clawhub@latest install corpus-search
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (语料检索,与 corpus-builder 配合) matches the files and behavior: it opens a ChromaDB persistent client in the corpus-builder corpus path, computes embeddings via sentence-transformers, and supports metadata filters. The storage path in default_config.yml explicitly points to the corpus-builder corpus directory, which is expected for this purpose.
Instruction Scope
SKILL.md only instructs running the included Python script and editing the config to point to the corpus. The script operates on the configured local persist_directory and does not reference unrelated system paths or require environment secrets. Note: loading the specified embedding model (SentenceTransformer with model name 'BAAI/bge-small-zh-v1.5') will typically download model weights from the model host (internet access) unless already cached.
Install Mechanism
There is no install hook; dependencies are declared in requirements.txt (pip). Those packages are plausible for the task (chromadb, sentence-transformers, pyyaml, rich, tqdm). No archives or external install URLs are used. The only minor mismatch: requirements.txt lists diskcache but the code currently uses only an in-memory cache (comment indicates diskcache was removed).
Credentials
The skill requests no environment variables or credentials and does not require unrelated secrets. The only notable external access is model download via sentence-transformers/HuggingFace (public model name provided) which does not require credentials for a public model; if a private model were used the user would need to provide HF credentials separately (not requested by this skill).
Persistence & Privilege
always is false and the skill is user-invocable. It does not modify other skills' configs or require persistent system-wide privileges. It reads from a local corpus directory (expected).
Assessment
This skill appears to do what it says: local semantic search over a ChromaDB corpus produced by corpus-builder. Before installing or running: 1) ensure the configured persist_directory points to the corpus you expect (inspect configs/default_config.yml); 2) be aware model loading (sentence-transformers) may download large weights from the internet — run in an environment with sufficient disk space and network policy you control; 3) verify the corpus directory contains only data you're willing to let the skill read (it will access files under the corpus-builder path); 4) optionally run the script in a sandbox or inspect the full script if you want to confirm behavior. The minor issues: requirements.txt includes diskcache although the code currently uses in-memory caching — harmless but worth noting.

Like a lobster shell, security has layers — review code before you run it.

latestvk97e9htqj0bznrjcbfxptfty1h841da9
95downloads
0stars
2versions
Updated 3w ago
v1.0.1
MIT-0

Corpus Search - 语料检索工具

与 corpus-builder 配合使用的语料检索工具,支持语义搜索和元数据过滤。

快速开始

cd ~/.openclaw/workspace/skills/corpus-search

# 基础搜索
python3 scripts/search_corpus.py -q "紧张的打斗场景" -c xuanhuan-full --limit 10

# 按场景过滤
python3 scripts/search_corpus.py -q "围攻" -c xuanhuan-full --scene 打斗 --limit 5

# 按情绪过滤
python3 scripts/search_corpus.py -q "修炼" -c xuanhuan-full --emotion 紧张 --limit 10

# JSON 输出
python3 scripts/search_corpus.py -q "突破" -c xuanhuan-full --json

命令行选项

选项说明
-q, --query搜索查询(必填)
-c, --collection语料库名称(必填)
--limit返回数量(默认 10)
--scene场景过滤(打斗/修炼/对话/探险等)
--emotion情绪过滤(紧张/轻松/悲伤/热血等)
--min-quality最低质量分(1-10)
--jsonJSON 格式输出
--export导出到文件
--verbose详细输出

输出示例

🔍 搜索结果:紧张的打斗场景
   语料库:xuanhuan-full
   返回数量:5

1. 相似度:87.5%
   场景:打斗
   情绪:紧张,热血
   节奏:快节奏
   来源:没钱修什么仙_第 1-10 章.txt

   内容预览:
   张羽只觉胸口一痛,低头看去,只见一柄长剑已刺入...

依赖

pip3 install -r requirements.txt --user

配置

编辑 configs/default_config.yml 修改语料库路径。

相关文件

  • scripts/search_corpus.py - 主程序
  • configs/default_config.yml - 配置文件

Version: 1.0.0

Comments

Loading comments...