{"skill":{"slug":"zotero-vectorize","displayName":"Zotero Vectorize","summary":"Build and maintain a cross-platform local Zotero semantic index using metadata embeddings and PDF full-text chunk embeddings. Use when the user asks to vecto...","description":"---\nname: zotero-vectorize\ndescription: Build and maintain a cross-platform local Zotero semantic index using metadata embeddings and PDF full-text chunk embeddings. Use when the user asks to vectorize a Zotero library, create or refresh metadata_vectors.json or fulltext_vectors.json, check for new Zotero items missing from the vector store, incrementally update a Zotero semantic/RAG index, verify vector store counts and sizes, or reproduce this workflow on Windows, macOS, or Linux.\n---\n\n# Zotero Vectorize\n\nBuild and maintain a **local-first, cross-platform Zotero vector store** for semantic search and RAG over bibliographic metadata and PDF full text.\n\nKeep `SKILL.md` focused on workflow. Read the reference files only when needed:\n\n- `references/config.md` — paths, environment variables, output layout\n- `references/data-format.md` — JSON schemas and file naming\n- `references/windows.md` / `macos.md` / `linux.md` — platform-specific path defaults and notes\n- `references/troubleshooting.md` — common failures and recovery\n\n## Core rules\n\n- Treat Zotero as **read-only input**. Never modify the user’s Zotero database or attachment storage.\n- Prefer creating a **database snapshot** before reading.\n- For incremental updates: **check first, report missing items, wait for user confirmation, then apply**.\n- Before any update that rewrites store files: **back up first, then write**.\n- Backup retention for this skill is fixed: keep only **the latest and previous backup** per file.\n- Default output filenames are:\n  - `metadata_vectors.json`\n  - `fulltext_vectors.json`\n  - `vector_store_metadata.json`\n\n## Workflow decision tree\n\n### 1) Detect or confirm paths\n\nIf the Zotero data directory, database path, or storage path is unknown:\n\n1. Read `references/config.md`\n2. Read the platform-specific reference (`windows.md`, `macos.md`, or `linux.md`)\n3. Run:\n\n```bash\npython scripts/detect_zotero_paths.py\n```\n\nIf the detected paths are wrong, ask the user to open Zotero and use **Show Data Directory**, then rerun with explicit `--data-dir`, `--db`, or `--storage-dir`.\n\n### 2) Create a database snapshot\n\nBefore full builds or incremental checks, snapshot the Zotero database:\n\n```bash\npython scripts/snapshot_zotero_db.py --output-dir <store-dir>\n```\n\nIf snapshotting fails because SQLite is locked, ask the user to close Zotero and retry.\n\n### 3) Build the metadata vector store\n\nUse this when the user asks to create or rebuild metadata embeddings for the Zotero library.\n\n```bash\npython scripts/build_metadata_vectors.py --output-dir <store-dir>\n```\n\nThis writes `metadata_vectors.json` and refreshes `vector_store_metadata.json` + `README.md`.\n\n### 4) Build the full-text vector store\n\nUse this when the user asks to create or rebuild PDF full-text embeddings.\n\n```bash\npython scripts/build_fulltext_vectors.py --output-dir <store-dir>\n```\n\nThis scans Zotero PDF attachments, extracts text, chunks it, embeds each chunk, and writes `fulltext_vectors.json`.\n\n### 5) Check incremental updates\n\nUse this when the user asks whether Zotero contains new items not yet added to the vector store.\n\n```bash\npython scripts/check_incremental_updates.py --output-dir <store-dir>\n```\n\nReport:\n\n- total top-level Zotero items\n- total PDF-parent items\n- current metadata/fulltext vector counts\n- missing metadata items\n- missing fulltext items\n\nDo **not** update the store yet.\n\n### 6) Apply incremental updates\n\nOnly run this after the user confirms the update.\n\n```bash\npython scripts/apply_incremental_updates.py --output-dir <store-dir>\n```\n\nThis script:\n\n1. snapshots the DB\n2. backs up store files\n3. appends missing metadata/fulltext entries\n4. keeps only the latest and previous backup per file\n5. updates store metadata and README\n\nUse `--item-id` to limit the update to specific items if the user wants a partial apply.\n\n### 7) Verify the finished store\n\nAfter any build or incremental update, verify counts and sizes:\n\n```bash\npython scripts/verify_vector_store.py --output-dir <store-dir>\n```\n\nAlways report:\n\n- metadata item count\n- fulltext item count\n- fulltext chunk count\n- metadata file size\n- fulltext file size\n\n## Scripts\n\n- `scripts/detect_zotero_paths.py` — resolve default/current Zotero paths\n- `scripts/snapshot_zotero_db.py` — create a safe SQLite snapshot\n- `scripts/build_metadata_vectors.py` — full rebuild of metadata vectors\n- `scripts/build_fulltext_vectors.py` — full rebuild of PDF full-text vectors\n- `scripts/check_incremental_updates.py` — compare Zotero against current vector store\n- `scripts/apply_incremental_updates.py` — append missing items after user confirmation\n- `scripts/backup_with_retention.py` — back up store files and retain only the latest two states\n- `scripts/verify_vector_store.py` — report counts, sizes, and store metadata\n\n## Output expectations\n\nWhen using this skill successfully, return concise operational summaries such as:\n\n- detected paths\n- snapshot path used\n- number of items/chunks written\n- current file sizes\n- whether any items are missing\n- which itemIDs were appended during incremental update\n\n## Escalation notes\n\nRead `references/troubleshooting.md` when:\n\n- SQLite snapshot fails\n- HuggingFace/model download or local model loading fails\n- PDFs are missing or unreadable\n- full-text extraction is incomplete\n- file paths differ from defaults on the current OS\n","tags":{"latest":"0.1.0"},"stats":{"comments":0,"downloads":610,"installsAllTime":1,"installsCurrent":1,"stars":0,"versions":1},"createdAt":1773056728769,"updatedAt":1778491788483},"latestVersion":{"version":"0.1.0","createdAt":1773056728769,"changelog":"- Initial release of zotero-vectorize.\n- Build and maintain a local, cross-platform Zotero semantic index using embeddings for metadata and PDF full-text chunks.\n- Strictly read-only input from Zotero; no modifications to user data.\n- Comprehensive, step-by-step workflow for builds, incremental updates, backups, and verification.\n- Detailed integration with reference files per platform and troubleshooting guidance.\n- Automated reporting of counts, file sizes, and operational summaries after actions.","license":"MIT-0"},"metadata":null,"owner":{"handle":"yckbz","userId":"s178n2x1xdzb94zjak22tm42hh885rrc","displayName":"yckbz","image":"https://avatars.githubusercontent.com/u/169670310?v=4"},"moderation":null}