Vector Store Shootout

Security checks across malware telemetry and agentic risk

Overview

This is a useful vector-store comparison skill, but its public metadata understates network and data-sharing behavior that can send document text to embedding services.

Install only if you are comfortable controlling where embeddings are generated. Leave OpenAI keys unset for local-only use, configure Ollama and backend URLs deliberately, avoid sensitive corpora unless you have verified data-flow behavior, and be careful with cleanup methods when passing custom database, table, or collection names.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (23)

Lp3

Medium
Category
MCP Least Privilege
Confidence
83% confidence
Finding
The skill metadata explicitly states outbound network is false, yet the referenced implementations and analysis indicate some backends perform network communication and may call external embedding APIs. This creates a trust and sandboxing problem: operators may allow the skill under the assumption it is local-only, while it can unexpectedly transmit data to remote services.

Tp4

High
Category
MCP Tool Poisoning
Confidence
88% confidence
Finding
The documented behavior does not match the actual capabilities: the skill claims 8 interchangeable local-style backends, but apparently includes only 7 distinct implementations and performs outbound requests to Ollama/OpenAI that are not disclosed. Hidden or under-documented external calls are dangerous because they can exfiltrate prompts, documents, embeddings, or metadata in environments that expect a local vector-store evaluation tool.

Context-Inappropriate Capability

Medium
Confidence
89% confidence
Finding
This LanceDB backend unexpectedly includes a fallback to a remote OpenAI embeddings API, which changes the trust and data-flow model from local/serverless storage to external transmission of document and query text. In a RAG or evaluation context, users may reasonably assume data stays local when using an embedded vector store, so this can cause unintended disclosure of sensitive content.

Description-Behavior Mismatch

Medium
Confidence
96% confidence
Finding
This backend sends full document and query text to embedding endpoints via HTTP in `_ollama_embed` and `_openai_embed`, which contradicts the expectation of a local vector-store comparison layer and can expose sensitive corpus/query data to another service. Even when Ollama is local, it is still a separate network service boundary; when OpenAI is used, the transmission leaves the host entirely.

Intent-Code Divergence

High
Confidence
98% confidence
Finding
The module docstring claims 'No external API calls in the indexing path,' but `add_documents()` calls `_embed()`, which may perform HTTP POST requests to Ollama or OpenAI. This misleading assurance can cause operators to index sensitive data under a false assumption that no network transmission occurs, increasing the chance of unintentional data disclosure.

Context-Inappropriate Capability

Medium
Confidence
92% confidence
Finding
The code sends arbitrary input texts to OpenAI's remote embeddings API, which can disclose sensitive document contents outside the local environment. In the context of a vector-store comparison skill, this is more dangerous because users may reasonably expect backend benchmarking, not silent transmission of corpus data to a third-party service.

Intent-Code Divergence

Medium
Confidence
93% confidence
Finding
The module promises a fixed 768-dimensional pgvector store, but the TF-IDF fallback returns vectors whose length depends on the input vocabulary. Because the table schema requires vector(768), fallback requests can fail at insert/query time or cause inconsistent behavior, creating a reliability and denial-of-service risk when embedding providers are unavailable.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The code sends raw document/query text to OpenAI's embeddings endpoint without any explicit disclosure or consent mechanism in the implementation. If this store is used on proprietary, regulated, or secret material, the external transmission can violate user expectations, privacy requirements, or data-handling policies.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The call sites that issue embedding requests do not provide an explicit warning or consent mechanism before sending raw texts externally or to a local HTTP service. In a reusable skill/backend, this is risky because callers may pass proprietary documents or user queries without realizing they are being transmitted beyond process memory.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
Both Ollama and OpenAI embedding paths transmit the provided texts to another service without any warning or consent mechanism in this file. Even if Ollama is often local, the URL is configurable and could point to a remote host, so the code can become an unintended data-exfiltration path for sensitive inputs.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The store sends raw text inputs to embedding services at runtime, including a third-party OpenAI endpoint and a configurable local Ollama URL, without any explicit consent, warning, or data-classification guard at the transmission points. In a RAG backend, inputs may contain secrets, proprietary documents, or personal data, so silent transmission can lead to unintended data disclosure.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The OpenAI fallback sends raw document/query text to a third-party external service whenever an API key is configured and Ollama is unavailable. That creates a real data exposure risk because callers may assume this store is local-first, yet content can leave the environment without an explicit opt-in or warning at the point of use.

External Transmission

Medium
Category
Data Exfiltration
Content
def _openai_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            "https://api.openai.com/v1/embeddings",
            headers={"Authorization": f"Bearer {self._openai_key}"},
            json={"input": texts, "model": "text-embedding-3-small"},
Confidence
92% confidence
Finding
requests.post( "https://

External Transmission

Medium
Category
Data Exfiltration
Content
def _openai_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            "https://api.openai.com/v1/embeddings",
            headers={"Authorization": f"Bearer {self._openai_key}"},
            json={"input": texts, "model": "text-embedding-3-small"},
Confidence
92% confidence
Finding
requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json=

External Transmission

Medium
Category
Data Exfiltration
Content
def _openai_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            "https://api.openai.com/v1/embeddings",
            headers={"Authorization": f"Bearer {self._openai_key}"},
            json={"input": texts, "model": "text-embedding-3-small"},
            timeout=30,
Confidence
90% confidence
Finding
https://api.openai.com/

External Transmission

Medium
Category
Data Exfiltration
Content
def _openai_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            "https://api.openai.com/v1/embeddings",
            headers={"Authorization": f"Bearer {self._openai_key}"},
            json={"input": texts, "model": "text-embedding-3-small"},
Confidence
97% confidence
Finding
requests.post( "https://

External Transmission

Medium
Category
Data Exfiltration
Content
def _ollama_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            self._ollama_url,
            json={"model": _OLLAMA_EMBED_MODEL, "input": texts},
            timeout=60,
Confidence
84% confidence
Finding
requests.post( self._ollama_url, json=

External Transmission

Medium
Category
Data Exfiltration
Content
def _openai_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            "https://api.openai.com/v1/embeddings",
            headers={"Authorization": f"Bearer {self._openai_key}"},
            json={"input": texts, "model": "text-embedding-3-small"},
Confidence
97% confidence
Finding
requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json=

External Transmission

Medium
Category
Data Exfiltration
Content
def _openai_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            "https://api.openai.com/v1/embeddings",
            headers={"Authorization": f"Bearer {self._openai_key}"},
            json={"input": texts, "model": "text-embedding-3-small"},
            timeout=30,
Confidence
96% confidence
Finding
https://api.openai.com/

External Transmission

Medium
Category
Data Exfiltration
Content
def _openai_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            "https://api.openai.com/v1/embeddings",
            headers={"Authorization": f"Bearer {self._openai_key}"},
            json={"input": texts, "model": "text-embedding-3-small"},
Confidence
94% confidence
Finding
requests.post( "https://

External Transmission

Medium
Category
Data Exfiltration
Content
def _ollama_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            self._ollama_url,
            json={"model": _OLLAMA_EMBED_MODEL, "input": texts},
            timeout=60,
Confidence
80% confidence
Finding
requests.post( self._ollama_url, json=

External Transmission

Medium
Category
Data Exfiltration
Content
def _openai_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            "https://api.openai.com/v1/embeddings",
            headers={"Authorization": f"Bearer {self._openai_key}"},
            json={"input": texts, "model": "text-embedding-3-small"},
Confidence
94% confidence
Finding
requests.post( "https://api.openai.com/v1/embeddings", headers={"Authorization": f"Bearer {self._openai_key}"}, json=

External Transmission

Medium
Category
Data Exfiltration
Content
def _openai_embed(self, texts: list[str]) -> list[list[float]]:
        import requests
        resp = requests.post(
            "https://api.openai.com/v1/embeddings",
            headers={"Authorization": f"Bearer {self._openai_key}"},
            json={"input": texts, "model": "text-embedding-3-small"},
            timeout=30,
Confidence
93% confidence
Finding
https://api.openai.com/

VirusTotal

63/63 vendors flagged this skill as clean.

View on VirusTotal