Install
openclaw skills install smart-code-searchSearch code and docs by meaning, not keywords. Powered by ColGREP/NextPlaid,
openclaw skills install smart-code-searchSearch code and docs by meaning, not just strings.
Powered by ColGREP and NextPlaid from LightOn — the engine behind the #1 ranked code retrieval model on MTEB and the #1 retriever on BrowseComp-Plus, OpenAI's hardest agentic search benchmark.
grep finds strings. This finds intent. Ask "payment capture logic" and get results from files that never contain those exact words — because it understands what your code does, not just what it says.
Every developer has been here: you know what you're looking for but not where it lives. You chain 4 different grep -r attempts, guess filenames, scroll through directory trees. Coding agents are even worse — they grep, miss things, hallucinate file paths, waste tokens exploring blind.
ColGREP fixes this with multi-vector semantic search. It parses your code with Tree-sitter, embeds each function/method/class with token-level vectors, and ranks results by meaning. The model is 17M parameters, runs on CPU, and returns results in under a second.
| Metric | Value |
|---|---|
| MTEB Code Leaderboard | #1 (LateOn-Code) |
| BrowseComp-Plus | 87.59% accuracy, beating all models up to 8B params (blog) |
| vs grep in coding agents | 70% win rate head-to-head |
| Model size | 17M params — 54× smaller than competing 8B models |
| Search latency | 200–900ms on CPU |
| API cost | $0. Forever. Runs 100% local |
| Privacy | Code never leaves your machine |
brew install lightonai/tap/colgrep
Verify: colgrep --version
cd /path/to/project
colgrep init
That's it. ColGREP parses every file with Tree-sitter, builds multi-vector embeddings on CPU, and stores the index in .colgrep/. Takes 30–60 seconds for ~1000 files. After this, the index auto-updates on every search — changed files are detected and re-indexed automatically.
colgrep "natural language description of what you want"
Results are ranked by semantic relevance score. Higher = better match.
Examples:
colgrep "authentication middleware token validation"
colgrep "database migration rollback strategy"
colgrep "React form validation with error display"
colgrep "webhook retry logic with exponential backoff"
Filter files by regex pattern first, then rank semantically:
colgrep -e "async.*await" "error handling patterns"
colgrep -e "def test_" "payment capture edge cases"
colgrep -e "\.tsx$" "patient dashboard layout"
colgrep "query" # Default output: file:lines (score: X.XX)
colgrep "query" --json # JSON output for piping to other tools
colgrep "query" -n 5 # Top 5 results only
| You know... | Use |
|---|---|
| The exact string or function name | grep -r "functionName" |
| The concept but not the words | colgrep "what it does" |
| A pattern + a concept | colgrep -e "pattern" "meaning" |
| Where something is implemented | colgrep "description of behavior" |
| How a feature works across files | colgrep "feature workflow" |
ColGREP provides built-in integration with popular coding agents. After installing, restart your agent to enable semantic search:
colgrep --install-claude-codecolgrep --install-opencodecolgrep --install-codexThese commands register ColGREP as a search tool within the agent. The agent will automatically use semantic search when navigating indexed projects.
Index each project independently. Search from the project directory:
cd ~/code/api && colgrep init
cd ~/code/frontend && colgrep init
cd ~/code/infrastructure && colgrep init
cd ~/docs && colgrep init
# Search each independently
cd ~/code/api && colgrep "payment processing service"
cd ~/code/frontend && colgrep "checkout form validation"
Works great for monorepos, microservices, documentation vaults, and any directory with text/code files.
ColGREP uses ColBERT late-interaction retrieval — a fundamentally different approach than traditional single-vector embeddings:
This is why a 17M model beats 8B models — late interaction preserves token-level semantics that single-vector approaches compress away. Read the full technical story: The Bloated Retriever Era Is Over
"Index is being updated by another process" — Another colgrep instance is updating. Current search uses existing index. Safe to ignore.
Re-index from scratch:
rm -rf .colgrep/ && colgrep init
Add to .gitignore:
.colgrep/