pgvector

PostgreSQL vector database skill with pgvector extension. Enables vector similarity search, embeddings storage, RAG (Retrieval-Augmented Generation) pipeline...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
0 · 171 · 1 current installs · 1 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name/description (PostgreSQL + pgvector helper for embeddings, similarity search, and RAG) matches the SKILL.md content: SQL for creating vector tables, indexes, queries, and a Python insertion example. There are no extraneous dependencies or unrelated capabilities requested.
Instruction Scope
Instructions stay on-topic (connecting to Postgres, creating tables/indexes, inserting/searching embeddings). They do not instruct reading arbitrary system files or exfiltrating data to external endpoints. However, the SKILL.md contains DDL/DML (CREATE/INSERT/UPDATE/DELETE) which — if the agent executes them against a live DB — will change data. The doc also uses explicit connection defaults (localhost:5433, user 'damien', empty password) which are operational details that could be misused if applied without care.
Install Mechanism
Instruction-only skill with no install spec and no code files; nothing is downloaded or written to disk by the skill itself.
Credentials
The skill does not declare or require environment variables or credentials (good minimal surface). It does show example PG environment variables (PGHOST, PGPORT, PGUSER, PGPASSWORD) and a default empty password in examples — these are examples only, but users should not assume the skill needs or will get secrets automatically. Recommend using a least-privilege DB user and non-empty password in real deployments.
Persistence & Privilege
always:false (normal) and the skill does not request persistent system-level privileges or attempt to modify other skill/system configs. Autonomous invocation is allowed by platform default — this is expected and not by itself a problem.
Assessment
This skill appears to be a straightforward pgvector/Postgres cheat-sheet and is consistent with its description. Before installing or invoking it: (1) ensure you only connect the agent to a test or controlled Postgres instance first — the SQL examples include CREATE/INSERT/UPDATE/DELETE which can modify data; (2) do not use the example 'empty' password in production and prefer a least-privilege DB user; (3) ensure the pgvector extension is installed on the target DB and that the agent has only the permissions it needs (read-only if you only want retrieval); (4) note the skill is instruction-only from an unknown/anonymous source (no homepage) — if you need guarantees about correctness or safety, review the SQL commands yourself or run them in a sandbox before letting an agent execute them automatically.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
latestvk97c0xwm7dy7e7ffd55w65znt182he7e

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🔢 Clawdis

SKILL.md

pgvector Skill

PostgreSQL + pgvector extension for vector similarity search.

Quick Connect

# Connect to pgvector database (default port 5433)
psql -h localhost -p 5433 -U damien -d postgres

# Or use environment variables
export PGHOST=localhost
export PGPORT=5433
export PGUSER=damien
export PGPASSWORD=''
export PGDATABASE=postgres

Environment

  • Host: localhost
  • Port: 5433
  • User: damien
  • Password: (empty)
  • Database: postgres

Core Capabilities

1. Create Vector Table

-- Basic vector table (1536 dimensions for OpenAI embeddings)
CREATE TABLE IF NOT EXISTS documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding vector(1536) NOT NULL,
    metadata JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create HNSW index for fast similarity search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Or use IVFFlat index (faster build, slower search)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

2. Insert Embeddings

-- Manual insert (replace with actual embedding)
INSERT INTO documents (content, embedding)
VALUES ('Your text here', '[0.1, 0.2, ..., 0.1536]');

-- With metadata
INSERT INTO documents (content, embedding, metadata)
VALUES (
    'AI is transforming technology',
    '[0.1, 0.3, ..., 0.5]',
    '{"source": "article", "author": "John"}'::jsonb
);

3. Vector Similarity Search

-- Cosine similarity (most common)
SELECT id, content, (1 - (embedding <=> '[query_embedding]')) AS similarity
FROM documents
ORDER BY embedding <=> '[query_embedding]'
LIMIT 5;

-- Euclidean distance
SELECT id, content, (embedding <-> '[query_embedding]') AS distance
FROM documents
ORDER BY embedding <-> '[query_embedding]'
LIMIT 5;

-- Inner product (for normalized vectors)
SELECT id, content, (embedding <#> '[query_embedding]') AS similarity
FROM documents
ORDER BY embedding <#> '[query_embedding]'
LIMIT 5;

4. Hybrid Search (Vector + Keyword)

-- Combine vector search with full-text search
SELECT id, content,
    (1 - (embedding <=> '[query_embedding]')) AS vector_score,
    ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'search terms')) AS text_score
FROM documents
WHERE content ILIKE '%search terms%'
ORDER BY (vector_score * 0.7 + text_score * 0.3) DESC
LIMIT 10;

5. RAG Pipeline Example

-- Store document chunks with embeddings
CREATE TABLE document_chunks (
    id BIGSERIAL PRIMARY KEY,
    document_id BIGINT REFERENCES documents(id),
    chunk_text TEXT NOT NULL,
    chunk_embedding vector(1536) NOT NULL,
    chunk_index INT NOT NULL
);

-- Retrieve relevant chunks for LLM context
SELECT chunk_text
FROM document_chunks
WHERE document_id = ?
ORDER BY chunk_embedding <=> '[question_embedding]'
LIMIT 5;

Management Commands

Check pgvector Extension

SELECT * FROM pg_extension WHERE extname = 'vector';

Table Info

-- List all tables with vectors
SELECT tablename FROM pg_tables WHERE schemaname = 'public';

-- Check index sizes
SELECT pg_size_pretty(pg_total_relation_size('documents'));

Monitoring

-- Check query performance
EXPLAIN ANALYZE
SELECT * FROM documents
ORDER BY embedding <=> '[query_embedding]'
LIMIT 5;

-- Index usage stats
SELECT * FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%';

Common Operations

Update Embedding

UPDATE documents
SET embedding = '[new_embedding]'
WHERE id = 1;

Delete

DELETE FROM documents WHERE id = 1;

Batch Insert (Python)

import psycopg2
import numpy as np

conn = psycopg2.connect(
    host="localhost",
    port=5433,
    user="damien",
    password="",
    database="postgres"
)

cur = conn.cursor()
for text, embedding in documents:
    cur.execute(
        "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
        (text, embedding.tolist())
    )
conn.commit()

Distance Operators

OperatorDescription
<->Euclidean distance
<=>Cosine distance
<#>Inner product
<=>Cosine distance (1 - cosine_similarity)

Use Cases

  1. Semantic Search - Find documents by meaning, not keywords
  2. RAG - Retrieve relevant context for LLM prompts
  3. Recommendations - Find similar items/products
  4. Anomaly Detection - Find outliers in embeddings
  5. Image/Video Search - Store and query visual embeddings

Notes

  • Vector dimensions must match your embedding model
  • HNSW is better for accuracy, IVFFlat better for large datasets
  • Normalize vectors for cosine similarity
  • pgvector supports up to 16,000 dimensions

Files

1 total
Select a file
Select a file to preview.

Comments

Loading comments…