Vector Store Shootout

Security checks across malware telemetry and agentic risk

Overview

This is a useful vector-store comparison skill, but its public metadata understates network and data-sharing behavior that can send document text to embedding services.

Install only if you are comfortable controlling where embeddings are generated. Leave OpenAI keys unset for local-only use, configure Ollama and backend URLs deliberately, avoid sensitive corpora unless you have verified data-flow behavior, and be careful with cleanup methods when passing custom database, table, or collection names.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (23)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 83% confidence
Finding: The skill metadata explicitly states outbound network is false, yet the referenced implementations and analysis indicate some backends perform network communication and may call external embedding APIs. This creates a trust and sandboxing problem: operators may allow the skill under the assumption it is local-only, while it can unexpectedly transmit data to remote services.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 88% confidence
Finding: The documented behavior does not match the actual capabilities: the skill claims 8 interchangeable local-style backends, but apparently includes only 7 distinct implementations and performs outbound requests to Ollama/OpenAI that are not disclosed. Hidden or under-documented external calls are dangerous because they can exfiltrate prompts, documents, embeddings, or metadata in environments that expect a local vector-store evaluation tool.

Context-Inappropriate Capability

Medium

Confidence: 89% confidence
Finding: This LanceDB backend unexpectedly includes a fallback to a remote OpenAI embeddings API, which changes the trust and data-flow model from local/serverless storage to external transmission of document and query text. In a RAG or evaluation context, users may reasonably assume data stays local when using an embedded vector store, so this can cause unintended disclosure of sensitive content.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: This backend sends full document and query text to embedding endpoints via HTTP in `_ollama_embed` and `_openai_embed`, which contradicts the expectation of a local vector-store comparison layer and can expose sensitive corpus/query data to another service. Even when Ollama is local, it is still a separate network service boundary; when OpenAI is used, the transmission leaves the host entirely.

Intent-Code Divergence

High

Confidence: 98% confidence
Finding: The module docstring claims 'No external API calls in the indexing path,' but `add_documents()` calls `_embed()`, which may perform HTTP POST requests to Ollama or OpenAI. This misleading assurance can cause operators to index sensitive data under a false assumption that no network transmission occurs, increasing the chance of unintentional data disclosure.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The code sends arbitrary input texts to OpenAI's remote embeddings API, which can disclose sensitive document contents outside the local environment. In the context of a vector-store comparison skill, this is more dangerous because users may reasonably expect backend benchmarking, not silent transmission of corpus data to a third-party service.

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The module promises a fixed 768-dimensional pgvector store, but the TF-IDF fallback returns vectors whose length depends on the input vocabulary. Because the table schema requires vector(768), fallback requests can fail at insert/query time or cause inconsistent behavior, creating a reliability and denial-of-service risk when embedding providers are unavailable.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The code sends raw document/query text to OpenAI's embeddings endpoint without any explicit disclosure or consent mechanism in the implementation. If this store is used on proprietary, regulated, or secret material, the external transmission can violate user expectations, privacy requirements, or data-handling policies.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The call sites that issue embedding requests do not provide an explicit warning or consent mechanism before sending raw texts externally or to a local HTTP service. In a reusable skill/backend, this is risky because callers may pass proprietary documents or user queries without realizing they are being transmitted beyond process memory.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: Both Ollama and OpenAI embedding paths transmit the provided texts to another service without any warning or consent mechanism in this file. Even if Ollama is often local, the URL is configurable and could point to a remote host, so the code can become an unintended data-exfiltration path for sensitive inputs.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The store sends raw text inputs to embedding services at runtime, including a third-party OpenAI endpoint and a configurable local Ollama URL, without any explicit consent, warning, or data-classification guard at the transmission points. In a RAG backend, inputs may contain secrets, proprietary documents, or personal data, so silent transmission can lead to unintended data disclosure.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The OpenAI fallback sends raw document/query text to a third-party external service whenever an API key is configured and Ollama is unavailable. That creates a real data exposure risk because callers may assume this store is local-first, yet content can leave the environment without an explicit opt-in or warning at the point of use.

External Transmission

Medium

Category: Data Exfiltration
Content: def _openai_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json={"input": texts, "model": "text-embedding-3-small"},
Confidence: 92% confidence
Finding: requests.post( "https://

External Transmission

Medium

Category: Data Exfiltration
Content: def _openai_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json={"input": texts, "model": "text-embedding-3-small"},
Confidence: 92% confidence
Finding: requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json=

External Transmission

Medium

Category: Data Exfiltration
Content: def _openai_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json={"input": texts, "model": "text-embedding-3-small"}, timeout=30,
Confidence: 90% confidence
Finding: https://api.openai.com/

External Transmission

Medium

Category: Data Exfiltration
Content: def _openai_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json={"input": texts, "model": "text-embedding-3-small"},
Confidence: 97% confidence
Finding: requests.post( "https://

External Transmission

Medium

Category: Data Exfiltration
Content: def _ollama_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( self._ollama_url, json={"model": _OLLAMA_EMBED_MODEL, "input": texts}, timeout=60,
Confidence: 84% confidence
Finding: requests.post( self._ollama_url, json=

External Transmission

Medium

Category: Data Exfiltration
Content: def _openai_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json={"input": texts, "model": "text-embedding-3-small"},
Confidence: 97% confidence
Finding: requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json=

External Transmission

Medium

Category: Data Exfiltration
Content: def _openai_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json={"input": texts, "model": "text-embedding-3-small"}, timeout=30,
Confidence: 96% confidence
Finding: https://api.openai.com/

External Transmission

Medium

Category: Data Exfiltration
Content: def _openai_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json={"input": texts, "model": "text-embedding-3-small"},
Confidence: 94% confidence
Finding: requests.post( "https://

External Transmission

Medium

Category: Data Exfiltration
Content: def _ollama_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( self._ollama_url, json={"model": _OLLAMA_EMBED_MODEL, "input": texts}, timeout=60,
Confidence: 80% confidence
Finding: requests.post( self._ollama_url, json=

External Transmission

Medium

Category: Data Exfiltration
Content: def _openai_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json={"input": texts, "model": "text-embedding-3-small"},
Confidence: 94% confidence
Finding: requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json=

External Transmission

Medium

Category: Data Exfiltration
Content: def _openai_embed(self, texts: list[str]) -> list[list[float]]: import requests resp = requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json={"input": texts, "model": "text-embedding-3-small"}, timeout=30,
Confidence: 93% confidence
Finding: https://api.openai.com/

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal