GitHub Semantic Search

Other

AI-Native GitHub Assistant powered by Embedder+Qdrant+LLM architecture. Index repos, semantic search across issues/PRs/code, proactive monitoring with Feishu alerts. Use when: (1) natural language GitHub queries, (2) tracking issues/PRs/CI across repos, (3) monitoring repos for bugs/keywords, (4) finding related issues without keyword matching.

Install

openclaw skills install github-semantic

🦞 Super GitHub — AI-Native GitHub Assistant

Powered by the same Embedder + Qdrant + LLM architecture as elite memory systems. Index repos, search semantically, monitor proactively — all with natural language.

Architecture

Query → [LLM: understand intent] → [Embedder: vectorize] → [Qdrant: semantic search] → [gh CLI: act]

Three-layer system (same as production memory pipelines):

LayerComponentRole
EmbedderOllama nomic-embed-textConverts text → 768-dim vectors
Vector StoreQdrant (local)Stores & searches vectors by similarity
Action Layergh CLIExecutes GitHub operations

Prerequisites

  • gh CLI authenticated (gh auth status)
  • Ollama running with nomic-embed-text:latest
  • Qdrant running at localhost:6333

Quick Start

# 1. Initialize Qdrant collection
python scripts/github_indexer.py init

# 2. Index a repo
python scripts/github_indexer.py add owner/repo --all

# 3. Search with natural language
python scripts/github_search.py "memory search failing in agent" --limit 10

# 4. Monitor for keywords
python scripts/github_monitor.py watch owner/repo --events issues,ci --keywords bug,broken,urgent

Scripts

ScriptPurpose
github_indexer.pyIndex repos (issues, PRs, metadata) into Qdrant
github_search.pyNatural language semantic search
github_monitor.pyProactive monitoring with keyword alerts

Detailed Commands

Index (github_indexer.py)

python github_indexer.py init                    # Create Qdrant collection
python github_indexer.py add owner/repo --all     # Index everything
python github_indexer.py add owner/repo --issues # Issues only
python github_indexer.py add owner/repo --prs    # PRs only
python github_indexer.py add owner/repo --repo   # Repo metadata
python github_indexer.py status                   # Show indexed data
python github_indexer.py rm owner/repo           # Remove from index

Search (github_search.py)

python github_search.py "query"                            # Search all
python github_search.py "query" --repo owner/repo         # Filter by repo
python github_search.py "query" --type issue              # Filter by type
python github_search.py "query" --limit 20               # More results
python github_search.py "query" --repo owner/repo --ci    # Show CI runs

Monitor (github_monitor.py)

python github_monitor.py watch owner/repo                  # Start watching
python github_monitor.py watch owner/repo --events issues,ci
python github_monitor.py status                            # Show watches
python github_monitor.py check                            # Run checks
python github_monitor.py unwatch owner/repo               # Stop watching

Memory System Analogy

ComponentGitHub SkillMemory System
DataIssues, PRs, codeConversations
Embeddernomic-embed-textnomic-embed-text
Vector StoreQdrantQdrant
Addgithub_indexer.pymem0 add
Searchgithub_search.pymem0 search

Why Vector Search vs Keyword?

Approach"memory problems" query
KeywordExact match only
Vector (this)"memory leak", "OOM", "out of memory"

Setup Checklist

  • gh auth login — authenticate GitHub CLI
  • ollama pull nomic-embed-text:latest — download embedder
  • Start Qdrant: qdrant --storage-path ./qdrant-data
  • python github_indexer.py init — create collection