Curated Search

Domain-restricted full-text search over curated technical documentation

MIT-0 · Free to use, modify, and redistribute. No attribution required.

⭐ 0 · 615 · 1 current installs · 1 all-time installs

by@qsmtco

MIT-0

Security Scan

VirusTotal

Pending

View report →

OpenClaw

Benign

high confidence

✓

Purpose & Capability

Name/description (domain-restricted curated search) lines up with the included files (crawler, indexer, content-extractor, search CLI) and the declared runtime requirement (node). There are no unexpected credentials, unusual binaries, or install steps in the bundle. The domain whitelist and seeds in config.yaml explain why a crawler is included.

ℹ

Instruction Scope

SKILL.md and README consistently state that search operations read local index files (data/index) and do not make network calls; that is reflected by a search CLI script. The crawler (scripts/crawl.js + src/crawler.js) does perform outbound HTTP requests, but those are described as user-initiated (npm run crawl) and controlled by config.yaml (whitelist, delays, robots.txt). One minor mismatch: SKILL.md emphasizes 'only when the user explicitly calls it' for invocation, but platform metadata allows normal autonomous invocation by agents (this is the platform default). This is an informational mismatch in documentation rather than a code contradiction, but you should be aware agents could be permitted to call the tool unless you restrict it.

✓

Install Mechanism

No install spec is present (instruction-only install), and the bundle contains source JS files and scripts that run under Node. There are no downloads from external or untrusted URLs, no archive extraction steps, and no package manager installs invoked automatically by the skill. Requiring 'node' is proportionate.

✓

Credentials

The skill requests no environment variables, no credentials, and reads only local configuration (config.yaml) and index files under its data/ path. The config contains domain whitelists and seeds (expected). No secret exfiltration indicators or unnecessary credential requirements are present in the manifest or docs.

✓

Persistence & Privilege

The skill is not marked always:true and does not request elevated platform privileges in the metadata. The documentation describes running the crawler periodically (cron/systemd) as an operator action; that implies operational choices but not an inherent persist/privilege escalation in the skill itself. It does not modify other skills' configs in the provided materials.

Assessment

This skill is internally consistent: the search tool reads a local MiniSearch index and requires only Node; network activity occurs only if you (or an operator) run the crawler (npm run crawl) which will fetch pages from the domains listed in config.yaml. Before installing or enabling scheduled crawls: 1) review and tune config.yaml domains/seeds and delays (to avoid hammering sites or crawling sensitive hosts); 2) run the crawler as a low-privileged user and keep it isolated (systemd/cron guidance in docs is helpful); 3) verify the package does not contain legacy network servers (the repo includes SECURITY_INCIDENT_2026-02-14.md documenting a previously removed server component — the current bundle claims that file was removed, but you may want to inspect the published archive yourself); 4) if you do not want agents to call the tool autonomously, adjust OpenClaw skill invocation settings (or disable model invocation for this skill). Overall the bundle looks coherent and proportionate, but exercise normal operational caution when enabling networked crawling.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.7

Download zip

latestvk971typ81vme8xy55f2ky7dc9n81c27g

License

MIT-0

Free to use, modify, and redistribute. No attribution required.

Termshttps://spdx.org/licenses/MIT-0.html

Runtime requirements

🔍 Clawdis

Binsnode

SKILL.md

Curated Search Skill

Summary

Domain-restricted full-text search over a curated whitelist of technical documentation (MDN, Python docs, etc.). Provides clean, authoritative results without web spam.

External Endpoints

This skill does not call any external network endpoints during search operations. The crawler optionally makes outbound HTTP requests during index builds (one‑time setup), but those are user‑initiated (npm run crawl) and respect the configured domain whitelist.

Security & Privacy

Search is fully local – After the index is built, all queries run offline; no data leaves your machine.
Crawling is optional and whitelist‑scoped – The crawler only accesses domains you explicitly list in config.yaml. It respects robots.txt and configurable delays.
No telemetry – No usage data is transmitted externally.
Configuration is read from local config.yaml and the index file in data/.

Model Invocation Note

The curated-search.search tool is invoked only when the user explicitly calls it. It does not run autonomously. OpenClaw calls the tool handler (scripts/search.js) when the user asks to search the curated index.

Trust Statement

By using this skill, you trust that the code operates locally and only crawls domains you approve. The skill does not send your queries or workspace data to any third party. Review the open‑source implementation before installing.

Tool: curated-search.search

Search the curated index.

Parameters

Name	Type	Required	Default	Description
`query`	string	yes	—	Search query terms
`limit`	number	no	5	Maximum results (capped by config.max_limit, typically 100)
`domain`	string	no	null	Filter to specific domain (e.g., `docs.python.org`)
`min_score`	number	no	0.0	Minimum relevance score (0.0–1.0); filters out low-quality matches
`offset`	number	no	0	Pagination offset (skip first N results)

Response

JSON array of result objects:

[
  {
    "title": "Python Tutorial",
    "url": "https://docs.python.org/3/tutorial/",
    "snippet": "Python is an easy to learn, powerful programming language...",
    "domain": "docs.python.org",
    "score": 0.87,
    "crawled_at": 1707712345678
  }
]

Fields:

title — Document title (cleaned)
url — Source URL (canonical)
snippet — Excerpt (~200 chars) from content
domain — Hostname of source
score — BM25 relevance score (higher is better; not normalized 0–1 but typically 0–1 range)
crawled_at — Unix timestamp when page was crawled

Example Agent Calls

search CuratedSearch for "python tutorial"
search CuratedSearch for "async await" limit=3 domain=developer.mozilla.org
search CuratedSearch for "linux man page" min_score=0.3

Errors

If an error occurs, the tool exits non-zero and prints a JSON error object to stderr, e.g.:

{
  "error": "index_not_found",
  "message": "Search index not found. The index has not been built yet.",
  "suggestion": "Run the crawler first: npm run crawl",
  "details": { "path": "data/index.json" }
}

Common error codes:

Code	Meaning	Suggested Fix
`config_missing`	Configuration file not found	Specify `--config` path or ensure config.yaml exists
`config_invalid`	YAML parsing failed	Check syntax in config.yaml
`config_missing_index_path`	`index.path` not set	Add index.path to config
`index_not_found`	Index file missing	Run `npm run crawl` to build index
`index_corrupted`	Index file incompatible or corrupted	Rebuild index with `npm run crawl`
`index_init_failed`	Unexpected index initialization error	Check permissions, reinstall dependencies
`missing_query`	No query provided	Provide `--query` argument
`query_too_long`	Query exceeds 1000 characters	Shorten the query
`limit_exceeded`	Limit > config.max_limit	Use a smaller limit
`invalid_domain`	Domain filter malformed	Use format like `docs.python.org`
`conflicting_flags`	Mutually exclusive flags used (e.g., `--stats` with `--query`)	Use flags correctly
`stats_failed`	Could not retrieve index stats	Ensure index is accessible
`search_failed`	Search execution threw an error	Check query and index integrity

Configuration

Edit config.yaml in the skill directory. Key sections:

domains — whitelist of allowed domains (required)
seeds — starting URLs for crawling
crawl — depth, delay, timeout, max_documents
content — min_content_length, max_content_length
index — path to index files
search — default_limit, max_limit, min_score

See README.md for full configuration docs.

Support

Full documentation: README.md
Technical specs: specs/
Build plan: PLAN.md
Contributor guide: CONTRIBUTING.md
Issues: Report on GitHub (or via OpenClaw maintainers)

Files

24 total

Select a file

Select a file to preview.

Comments

Loading comments…