code-memory-skill

Other

Use when doing coding work in a Git repository and semantic code search, AST-aware symbol lookup, documentation search, Git-history search, or dead-code discovery would help.

Install

openclaw skills install code-memory-skill

code-memory

code-memory is a local MCP server for code intelligence. It indexes a project into a local SQLite database, extracts AST symbols where supported, embeds code/docs with sentence-transformers, and exposes MCP tools for code search, docs search, Git history, index stats, and dead-code candidates.

Upstream: https://github.com/kapillamba4/code-memory Reviewed version: v1.0.32 / commit 563788a2a7d015699f20251d404aeb293346f40c.

When To Use

Use this skill anytime coding a GitHub project or any local Git repository, especially when:

  • Starting work in an unfamiliar or large codebase.
  • Looking for definitions, references, file structure, or concept-related code.
  • Understanding architecture, README/doc conventions, or docstrings.
  • Debugging regressions where Git history, file history, blame, or commit details matter.
  • Looking for candidate dead code before a cleanup.

Do not use it as the only evidence for risky edits. It is retrieval assistance, not a replacement for rg, direct file reads, tests, typechecks, or human review.

Install

Prerequisites:

  • Python >=3.13.
  • uv/uvx recommended. Install uv with the official Astral installer or package manager.
  • First run downloads the default embedding model, currently jinaai/jina-code-embeddings-0.5b, to the HuggingFace cache. Expect roughly 600 MB download and about 1 GB+ RAM when loaded.
  • On GPU machines, it may auto-use CUDA or MPS. Use CPU mode when GPU memory is tight.

Recommended MCP command:

uvx code-memory

Pip install:

python3.13 -m pip install code-memory
code-memory

From source:

git clone https://github.com/kapillamba4/code-memory.git
cd code-memory
uv sync
uv run mcp run code_memory/server.py

Standalone binaries are published on GitHub Releases. Treat binaries as a supply-chain trust decision: prefer package/source installs unless there is a specific reason to use a binary.

MCP Configuration

Use stdio for normal per-project MCP hosting:

{
  "mcpServers": {
    "code-memory": {
      "command": "uvx",
      "args": ["code-memory"]
    }
  }
}

For a shared server, run:

uvx code-memory --transport sse --host 127.0.0.1 --port 8765

Then configure the MCP host to use:

{
  "mcpServers": {
    "code-memory": {
      "url": "http://127.0.0.1:8765/sse"
    }
  }
}

Do not bind SSE to 0.0.0.0 or a public interface without an authenticated reverse proxy. The SSE endpoint is unauthenticated.

Use Workflow

Before searching a project:

  1. Check status if available: check_index_status(directory="/path/to/repo").
  2. Index first: index_codebase(directory="/path/to/repo").
  3. Re-index after meaningful file changes or if results look stale.
  4. Use direct reads/tests after retrieval before making edits.

Core tools:

  • index_codebase(directory, cpu=false): builds/refreshes the local index. Use cpu=true if CUDA/MPS memory is constrained.
  • check_index_status(directory): quick readiness check.
  • get_index_stats(directory): index size, coverage, model, and database health.
  • search_code(query, search_type, directory): semantic/structural code lookup. search_type is topic_discovery, definition, references, or file_structure.
  • search_docs(query, directory, top_k=10): README/docs/docstring search.
  • search_history(query, directory, search_type, target_file, line_start, line_end): Git history, file history, blame, and commit details.
  • find_dead_code(directory, min_confidence=0.5, kinds=null, include_tests=false, top_k=50): candidate unused functions/classes/methods. Verify manually before deleting anything.

Useful examples:

index_codebase(directory="/home/jim/project")
search_code(query="authentication middleware", search_type="topic_discovery", directory="/home/jim/project")
search_code(query="UserService", search_type="definition", directory="/home/jim/project")
search_code(query="send_email", search_type="references", directory="/home/jim/project")
search_docs(query="deployment architecture", directory="/home/jim/project", top_k=5)
search_history(query="timeout", search_type="commits", directory="/home/jim/project")
search_history(query="", search_type="file_history", target_file="src/auth.py", directory="/home/jim/project")
find_dead_code(directory="/home/jim/project", min_confidence=0.75)

Files And State

  • Creates code_memory.db in the indexed project root.
  • SQLite WAL mode may also create code_memory.db-wal and code_memory.db-shm.
  • The index stores file paths, source excerpts/symbol text, doc chunks, embeddings, and Git-derived search data.
  • .gitignore is respected, including nested .gitignore files. Built-in skipped dirs include .git, .venv, venv, node_modules, caches, dist, build, target, bin, and obj.
  • If a repo contains secrets that are not ignored, they can be indexed. Add sensitive files to .gitignore or exclude/remove the database after accidental indexing.
  • Add code_memory.db* to .gitignore unless there is a deliberate reason to version the index. Usually there is not.

Python And Dependencies

Current upstream requires Python >=3.13 and declares these runtime dependencies:

  • mcp[cli]
  • sentence-transformers
  • sqlite-vec
  • tree-sitter
  • language grammars: tree-sitter-python, tree-sitter-javascript, tree-sitter-typescript, tree-sitter-java, tree-sitter-kotlin, tree-sitter-go, tree-sitter-rust, tree-sitter-c, tree-sitter-cpp, tree-sitter-ruby
  • gitpython
  • pathspec
  • markdown-it-py
  • einops
  • xxhash

Development dependencies include pytest, pytest-asyncio, pytest-cov, ruff, and mypy.

Supported AST parsing: Python, JavaScript/TypeScript, Java, Go, Rust, C/C++, Ruby, Kotlin.

Fallback whole-file indexing: C#, Swift, Scala, Lua, shell, YAML/TOML/JSON, HTML/CSS, SQL, Markdown, text, and similar source-like files.

Environment variables:

  • CODE_MEMORY_LOG_LEVEL: DEBUG, INFO, WARNING, or ERROR.
  • EMBEDDING_MODEL: HuggingFace model id. Changing it invalidates/rebuilds indexes.
  • CODE_MEMORY_DEVICE: auto, cuda, mps, or cpu.
  • CODE_MEMORY_BATCH_SIZE: embedding batch size, default 64.
  • CODE_MEMORY_MAX_WORKERS: parser thread pool size, default 4.
  • CODE_MEMORY_RERANK: true/1/yes enables cross-encoder reranking.
  • RERANK_MODEL: HuggingFace model id for reranking.

Security Review

Observed positives:

  • No obvious telemetry, analytics, HTTP client, or external API reporting path in reviewed source.
  • Code/docs/Git search runs locally after package/model installation.
  • Paths are resolved and directory existence is validated for indexing.
  • SQLite queries appear parameterized where user-provided values matter.
  • .gitignore and common dependency/build/cache folders are skipped during indexing.

Important risks and controls:

  • sentence-transformers loads the embedding model with trust_remote_code=True. The default HuggingFace model therefore becomes executable code. For sensitive environments, pre-vet/pin the model or use a trusted local/bundled model via EMBEDDING_MODEL.
  • First install/run fetches packages and model artifacts from package/model registries. For high-trust work, pin versions/hashes or pre-cache artifacts from a vetted machine.
  • SSE transport is unauthenticated. Keep --host 127.0.0.1; never expose it directly to a LAN or internet interface.
  • The MCP server can index any directory path the host allows it to access. Do not run it with broader filesystem access than needed.
  • code_memory.db* contains searchable code snippets and embeddings. Treat it as sensitive project data; do not commit, sync, or share casually.
  • Secrets accidentally committed or merely present in non-ignored files can be indexed. Check .gitignore before indexing private repos.
  • find_dead_code is heuristic. Reflection, decorators, framework registration, dynamic dispatch, exports, and tests can make live code appear unused.
  • Standalone binaries should be verified before use. Prefer source/PyPI installs if provenance matters.

Troubleshooting

  • Empty code/docs results: run index_codebase(directory) first, then search again.
  • Git history errors: make sure directory is inside a Git repository.
  • CUDA out-of-memory: call index_codebase(..., cpu=true) or set CODE_MEMORY_DEVICE=cpu.
  • Slow first run: the model download and warmup are normal. Subsequent runs reuse the HuggingFace cache.
  • Stale results after edits: re-run index_codebase; indexing is incremental.
  • Model changed: re-index. Existing index metadata is invalidated when embedding model/dimension changes.