RAG Retriever V3

Security checks across malware telemetry and agentic risk

Overview

The skill is a coherent RAG document retriever, but its default/auto embedding behavior can send document text and queries to OpenAI when an API key is present without a clear privacy warning.

Install only if you are comfortable with a RAG tool that may download model files and, when OPENAI_API_KEY is present or OpenAI is configured, may send document chunks and search queries to OpenAI. For private documents, explicitly configure the Xenova/local provider, avoid exporting OPENAI_API_KEY in the runtime environment, and restrict access to any code path that can drop collections.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (8)

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The module comment states the embedding model runs locally without an API key, but the code enables both local and remote model loading via env.allowRemoteModels = true. This mismatch can mislead operators into assuming there is no network access or third-party dependency fetch, which creates supply-chain, privacy, and policy-compliance risk when models are downloaded at runtime.

Intent-Code Divergence

Medium

Confidence: 87% confidence
Finding: The file header states the cross-encoder runs locally, but the code enables both local and remote model loading via env.allowRemoteModels = true. This mismatch can cause operators to trust the component as offline/private when it may fetch model artifacts over the network, creating an unexpected supply-chain and data-handling risk.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The README instructs users to add documents and notes support for OpenAI embeddings, but it does not warn that document contents may be transmitted to a third-party API or indexed for retrieval. In a RAG system, users may ingest sensitive internal documents, so missing disclosure and handling guidance creates a real privacy and compliance risk even though the issue is documentation-related rather than an exploit primitive.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documentation explicitly supports OpenAI/cloud embeddings but does not warn users that document contents and queries may be transmitted to an external provider during embedding or retrieval operations. In a RAG skill, this can expose sensitive internal documents, prompts, or user queries to third-party services without informed consent, which is a real privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: The dropCollection method performs irreversible table deletion on any supplied name with no confirmation, allowlist, guardrail, or authorization check visible in this component. In an agent or skill context, if the method is exposed to untrusted inputs or invoked by mistake, it can cause data loss and service disruption, especially because it initializes the DB automatically and executes immediately.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The code sends arbitrary text inputs to the OpenAI embeddings API, which is a third-party remote service, without any built-in notice, consent flow, or guardrails for sensitive data. In contexts where callers may pass proprietary, personal, or regulated content, this can cause unintended data disclosure to an external provider.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: Remote model downloads are explicitly enabled, yet the class presents itself as a local embedding provider and provides no user-facing warning or consent flow. In a security-sensitive agent environment, silent network fetches can leak usage metadata, violate offline assumptions, and expose the system to unreviewed third-party model artifacts at runtime.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: Remote model loading is explicitly enabled, and this reranker processes user queries and document text that may be sensitive. In skill context, this increases risk because users may assume reranking is local while model fetches can introduce external network dependency, possible unreviewed model code/artifacts, and privacy concerns if deployment behavior is not tightly controlled.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal