Smart Code Search

v1.0.2

Search code and docs by meaning, not keywords. Powered by ColGREP/NextPlaid,

0· 139·0 current·0 all-time
byBrett Hammond@brettmhammond

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for brettmhammond/smart-code-search.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "Smart Code Search" (brettmhammond/smart-code-search) from ClawHub.
Skill page: https://clawhub.ai/brettmhammond/smart-code-search
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Required binaries: colgrep
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install smart-code-search

ClawHub CLI

Package manager switcher

npx clawhub@latest install smart-code-search
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (semantic code search) match the declared requirement (colgrep) and the SKILL.md which documents installing and using ColGREP. Required binary and examples are proportionate to the stated goal.
Instruction Scope
Instructions are limited to installing ColGREP, indexing the project (writes an index to .colgrep/), and running searches. This implies reading most files under the project and writing a local index (expected for a search tool). The SKILL.md also documents agent-integration commands (e.g., --install-claude-code) which register the tool with agents — users should verify what those installer commands change before running them.
Install Mechanism
Install uses Homebrew formula lightonai/tap/colgrep, which is a reasonable packaging route. Because it is a third-party tap (lightonai/tap) rather than an official core formula, users should validate the tap/formula source before installing; otherwise the install mechanism is standard and proportionate.
Credentials
No environment variables, credentials, or unrelated config paths are requested. The tool operates on local files and stores its index under .colgrep/ in the project root, which is consistent with the described behavior.
Persistence & Privilege
The skill is not always-enabled and does not request elevated privileges. Its persistence is limited to the local index (.colgrep/) and optional agent integration commands; users should review what the agent integration commands change but the default privilege/persistence model is appropriate.
Assessment
This skill appears coherent for local semantic code search. Before installing: (1) verify the Homebrew tap formula source (lightonai/tap) and inspect the formula if possible; (2) run installs in a trusted environment or sandbox if you don't trust the binary source; (3) be aware ColGREP will read your project files and write an index to .colgrep/ (add it to .gitignore if you don't want it checked into VCS); (4) inspect or test the agent-integration commands (--install-*) to see what they modify before running them. If you need extra assurance, review the upstream project linked in the SKILL.md (LightOn repo/blog) and prefer installing from an upstream GitHub release or build from source.

Like a lobster shell, security has layers — review code before you run it.

Runtime requirements

🔍 Clawdis
Binscolgrep

Install

Install ColGREP via Homebrew
Bins: colgrep
brew install lightonai/tap/colgrep
latestvk973z7v05y95nqtr2716yc4rah83mr00
139downloads
0stars
3versions
Updated 1mo ago
v1.0.2
MIT-0

Smart Code Search

Search code and docs by meaning, not just strings.

Powered by ColGREP and NextPlaid from LightOn — the engine behind the #1 ranked code retrieval model on MTEB and the #1 retriever on BrowseComp-Plus, OpenAI's hardest agentic search benchmark.

grep finds strings. This finds intent. Ask "payment capture logic" and get results from files that never contain those exact words — because it understands what your code does, not just what it says.

Why This Exists

Every developer has been here: you know what you're looking for but not where it lives. You chain 4 different grep -r attempts, guess filenames, scroll through directory trees. Coding agents are even worse — they grep, miss things, hallucinate file paths, waste tokens exploring blind.

ColGREP fixes this with multi-vector semantic search. It parses your code with Tree-sitter, embeds each function/method/class with token-level vectors, and ranks results by meaning. The model is 17M parameters, runs on CPU, and returns results in under a second.

The Numbers

MetricValue
MTEB Code Leaderboard#1 (LateOn-Code)
BrowseComp-Plus87.59% accuracy, beating all models up to 8B params (blog)
vs grep in coding agents70% win rate head-to-head
Model size17M params — 54× smaller than competing 8B models
Search latency200–900ms on CPU
API cost$0. Forever. Runs 100% local
PrivacyCode never leaves your machine

Install

brew install lightonai/tap/colgrep

Verify: colgrep --version

Quick Start

1. Index Your Project

cd /path/to/project
colgrep init

That's it. ColGREP parses every file with Tree-sitter, builds multi-vector embeddings on CPU, and stores the index in .colgrep/. Takes 30–60 seconds for ~1000 files. After this, the index auto-updates on every search — changed files are detected and re-indexed automatically.

2. Search

colgrep "natural language description of what you want"

Results are ranked by semantic relevance score. Higher = better match.

Examples:

colgrep "authentication middleware token validation"
colgrep "database migration rollback strategy"
colgrep "React form validation with error display"
colgrep "webhook retry logic with exponential backoff"

3. Combine Regex + Semantics

Filter files by regex pattern first, then rank semantically:

colgrep -e "async.*await" "error handling patterns"
colgrep -e "def test_" "payment capture edge cases"
colgrep -e "\.tsx$" "patient dashboard layout"

Search Options

colgrep "query"              # Default output: file:lines (score: X.XX)
colgrep "query" --json       # JSON output for piping to other tools
colgrep "query" -n 5         # Top 5 results only

When to Use This vs grep

You know...Use
The exact string or function namegrep -r "functionName"
The concept but not the wordscolgrep "what it does"
A pattern + a conceptcolgrep -e "pattern" "meaning"
Where something is implementedcolgrep "description of behavior"
How a feature works across filescolgrep "feature workflow"

Coding Agent Integration

ColGREP provides built-in integration with popular coding agents. After installing, restart your agent to enable semantic search:

  • Claude Code: colgrep --install-claude-code
  • OpenCode: colgrep --install-opencode
  • Codex: colgrep --install-codex

These commands register ColGREP as a search tool within the agent. The agent will automatically use semantic search when navigating indexed projects.

Multi-Project Setup

Index each project independently. Search from the project directory:

cd ~/code/api && colgrep init
cd ~/code/frontend && colgrep init
cd ~/code/infrastructure && colgrep init
cd ~/docs && colgrep init

# Search each independently
cd ~/code/api && colgrep "payment processing service"
cd ~/code/frontend && colgrep "checkout form validation"

Works great for monorepos, microservices, documentation vaults, and any directory with text/code files.

How It Works

ColGREP uses ColBERT late-interaction retrieval — a fundamentally different approach than traditional single-vector embeddings:

  1. Tree-sitter parses your code into structured units (functions, methods, classes, signatures)
  2. LateOn-Code-edge (17M params) creates multiple token-level embeddings per code unit — not one lossy summary vector
  3. NextPlaid stores these in a quantized, memory-mapped Rust index
  4. At search time, query tokens interact with document tokens for fine-grained relevance scoring

This is why a 17M model beats 8B models — late interaction preserves token-level semantics that single-vector approaches compress away. Read the full technical story: The Bloated Retriever Era Is Over

Interpreting Scores

  • 6.0+ — Near-exact conceptual match. The code does exactly what you described.
  • 5.0–6.0 — Strong semantic match. Highly relevant code.
  • 4.0–5.0 — Good match. Related code worth reviewing.
  • 3.0–4.0 — Weak match. May or may not be relevant.
  • Below 3.0 — Likely noise. Ignore these results.

Troubleshooting

"Index is being updated by another process" — Another colgrep instance is updating. Current search uses existing index. Safe to ignore.

Re-index from scratch:

rm -rf .colgrep/ && colgrep init

Add to .gitignore:

.colgrep/

Links

Comments

Loading comments...