{"skill":{"slug":"search-engine","displayName":"Search Engine","summary":"Design and build any search engine with robust indexing, retrieval logic, relevance controls, and evaluation workflows for production systems.","description":"---\nname: Search Engine\nslug: search-engine\nversion: 1.0.0\nhomepage: https://clawic.com/skills/search-engine\ndescription: Design and build any search engine with robust indexing, retrieval logic, relevance controls, and evaluation workflows for production systems.\nchangelog: Initial release with indexing pipeline guidance, query handling patterns, and quality evaluation checklists for reliable engine delivery.\nmetadata: {\"clawdbot\":{\"emoji\":\"S\",\"requires\":{\"bins\":[]},\"os\":[\"darwin\",\"linux\",\"win32\"]}}\n---\n\n## Setup\n\nOn first use, read `setup.md` and establish activation behavior, system scope, and data constraints before proposing implementation steps.\n\n## When to Use\n\nUser needs to create, redesign, or scale a search engine for applications, documentation, products, or internal knowledge bases. Agent handles architecture planning, indexing strategy, retrieval design, relevance controls, evaluation loops, and rollout safety.\n\n## Architecture\n\nMemory lives in `~/search-engine/`. See `memory-template.md` for baseline structure and status values.\n\n```text\n~/search-engine/\n|-- memory.md              # Persistent context, constraints, and active priorities\n|-- requirements.md        # Retrieval goals, latency targets, and relevance expectations\n|-- experiments.md         # Offline experiments and tuning decisions\n`-- incidents.md           # Production issues, root cause, and remediation notes\n```\n\n## Quick Reference\n\nUse the smallest relevant file for the task.\n\n| Topic | File |\n|-------|------|\n| Setup and activation behavior | `setup.md` |\n| Memory template and status model | `memory-template.md` |\n| Architecture options and component choices | `architecture-blueprint.md` |\n| Retrieval and ranking strategy patterns | `retrieval-patterns.md` |\n| Quality measurement and evaluation loops | `evaluation-metrics.md` |\n| Delivery and rollout gates | `implementation-checklist.md` |\n\n## Data Storage\n\nLocal notes stay in `~/search-engine/`:\n- requirements and relevance objectives\n- data source assumptions and indexing decisions\n- experiment outcomes and deployment safeguards\n\n## Core Rules\n\n### 1. Start with a Retrieval Contract, Not with Tools\nBefore selecting engines, define the contract:\n- query types to support (keyword, phrase, semantic, hybrid)\n- response format, latency budget, and freshness target\n- error tolerance and fallback behavior\n\nA search engine without a contract becomes an untestable collection of features.\n\n### 2. Design Ingestion and Indexing as a Deterministic Pipeline\nEvery document should pass explicit stages:\n- ingestion source validation and deduplication\n- normalization and field extraction\n- chunking policy with stable identifiers\n- indexing with repeatable transforms\n\nDeterministic pipelines reduce drift between environments and simplify debugging.\n\n### 3. Separate Recall Layers from Precision Layers\nTreat retrieval as a staged system:\n- broad candidate retrieval first (lexical, vector, or hybrid)\n- reranking and business rules second\n- formatting and explanation last\n\nMixing all concerns in one step hides failures and makes tuning unpredictable.\n\n### 4. Define Relevance Features as Versioned Policy\nRelevance changes must be tracked as policy versions:\n- feature weights and boosts\n- typo tolerance and synonym policy\n- filtering, faceting, and tie-break rules\n\nNever ship silent relevance changes without versioned notes and measured deltas.\n\n### 5. Evaluate Offline Before Production Writes\nFor each relevance or indexing change:\n- run benchmark queries with labeled expectations\n- measure hit quality, ordering quality, and coverage\n- compare against current baseline and note regressions\n\nIf evaluation evidence is weak, keep the current configuration and iterate.\n\n### 6. Build Idempotent Index Operations and Safe Rollback\nIndex updates must be replay-safe:\n- stable document ids and version checks\n- resumable batch jobs with checkpoints\n- alias-based or dual-index rollback plan\n\nWithout idempotency and rollback, incident recovery becomes guesswork.\n\n### 7. Match Complexity to Workload Reality\nUse the minimum architecture that meets requirements:\n- avoid distributed complexity for small datasets\n- avoid simplistic models for multilingual or high-noise corpora\n- revisit design as scale and usage patterns change\n\nOver-engineering and under-engineering both create expensive rework.\n\n## Common Traps\n\n- Starting with vendor selection before defining retrieval requirements -> architecture lock-in with unclear success criteria\n- Indexing raw data without field-level normalization -> poor filters, weak facets, and noisy matching\n- Tuning relevance on one happy-path query set -> brittle results in real user traffic\n- Applying business boosts without guardrails -> top results become commercially biased and less useful\n- Shipping retrieval changes without offline baseline comparison -> regressions discovered only by users\n- Running full reindex jobs without resumability -> long outages and partial data corruption\n- Ignoring multilingual tokenization differences -> severe precision drop for non-English users\n\n## Security & Privacy\n\nData that leaves your machine:\n- none by default in this instruction set\n- only user-approved integration traffic when the user explicitly connects external services\n\nData that stays local:\n- planning notes and experiment logs under `~/search-engine/`\n- constraints, relevance decisions, and rollback records\n\nThis skill does NOT:\n- collect unrelated files or credentials\n- require hidden network calls\n- bypass user-confirmed environment boundaries\n\n## Related Skills\nInstall with `clawhub install <slug>` if user confirms:\n- `api` - Define stable APIs for indexing, querying, and retrieval orchestration\n- `elasticsearch` - Implement production indexing and query execution on Elasticsearch\n- `meilisearch` - Ship lightweight retrieval stacks with fast iteration cycles\n- `engineering` - Structure implementation workstreams and technical decision logs\n- `software-engineer` - Improve delivery quality with testable architecture and rollout discipline\n\n## Feedback\n\n- If useful: `clawhub star search-engine`\n- Stay updated: `clawhub sync`\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":844,"installsAllTime":31,"installsCurrent":5,"stars":1,"versions":1},"createdAt":1772648763745,"updatedAt":1778491721623},"latestVersion":{"version":"1.0.0","createdAt":1772648763745,"changelog":"Initial release with indexing pipeline guidance, query handling patterns, and quality evaluation checklists for reliable engine delivery.","license":null},"metadata":{"setup":[],"os":["darwin","linux","win32"],"systems":null},"owner":{"handle":"ivangdavila","userId":"s178jdk12x4qj3gs2se3etxf3h83h7ft","displayName":"Iván","image":"https://avatars.githubusercontent.com/u/81719670?v=4"},"moderation":null}