semantic-search

Security checks across malware telemetry and agentic risk

Overview

This skill does what it claims, but it handles live database data and sends schema/query context to model and reranking services with weak disclosure and unsafe example configuration.

Install only in a controlled environment with read-only database credentials, non-production or approved data, and approved private/model endpoints. Remove or rotate any credentials shown in the bundled docs, avoid insecure database transport in production, and review logging/retention before using it with sensitive datasets.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (13)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 92% confidence
Finding: The skill declares required environment variables and documents Python execution, database connectivity, and local file access patterns, but it does not declare corresponding permissions or clearly constrain those capabilities. This creates a transparency and governance gap: operators may enable a skill that can access networked database resources and local files without explicit permission review.

Description-Behavior Mismatch

Medium

Confidence: 89% confidence
Finding: The retriever automatically creates a backing vector table when it does not exist, which is a state-changing operation inconsistent with a search-only skill. In an agent environment, this expands the skill's authority from read-only retrieval to schema mutation, increasing the blast radius of accidental misuse or unauthorized invocation.

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The skill exposes an add_example method that writes arbitrary question/SQL pairs into the vector database, which goes beyond the declared semantic-search behavior. If reachable by an agent or untrusted input path, this can poison retrieval results, influence downstream Text-to-SQL generation, and persist attacker-controlled content for future sessions.

Missing User Warnings

Medium

Confidence: 99% confidence
Finding: The document exposes real-looking credentials, internal IPs, service URLs, database connection details, and message queue/Postgres/Nacos access information in plaintext. Even though this is 'just documentation', embedded secrets and internal topology materially lower the effort required for unauthorized access, lateral movement, and targeting of internal infrastructure.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill advertises Text-to-SQL generation and shows returned SQL/results, but it does not warn that generated SQL may be executed against live databases and can expose sensitive records. In this context, natural-language-driven SQL generation materially increases the risk of overbroad queries, unintended data disclosure, and misuse against production systems.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The documentation includes database host, port, username, password, and a sample configuration with `insecure: true` without warning about secret handling or transport security. This can normalize unsafe deployment practices, encourage plaintext secret storage, and lead users to connect to databases over insecure channels.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The code sends the user query and candidate document content to external reranking services via call_qwen_rerank and call_rerank_sdk. In an enterprise semantic-search skill, those contents can contain sensitive business metadata, file names, or internal text, so transmitting them to third-party services without explicit consent, minimization, or policy enforcement creates a real data-exposure risk.

Missing User Warnings

Medium

Confidence: 86% confidence
Finding: The code sends both user queries and table metadata to an external LLM via `self.llm.create` without any visible consent, disclosure, minimization, or sensitivity checks in this component. In an enterprise semantic-search/Text-to-SQL context, table metadata can reveal schema names, business concepts, or sensitive field labels, so forwarding it to a model can create a confidentiality and compliance risk even if the behavior is functionally intended.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The code logs the raw user query directly via `logger.info`, which can expose sensitive user-entered data in logs. In an enterprise semantic search and Text-to-SQL context, queries may contain internal dataset names, business terms, credentials pasted by mistake, or personal data, so retaining them without minimization or disclosure creates a real privacy and data-governance risk.

Missing User Warnings

Low

Confidence: 92% confidence
Finding: The code logs the raw user query via `logger.info`, which can expose sensitive search terms, internal identifiers, or personal data to application logs. In an enterprise semantic search skill, queries may contain confidential business content, making indiscriminate logging a real privacy and data-governance risk even if it is not directly exploitable as code execution.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The agent sends a resource-derived identifier to an internal metadata service automatically, without any visible consent, minimization, or trust-boundary checks. Even if the identifier itself is not highly sensitive, this creates an implicit data flow from user/task context to a remote service and can expose dataset existence, schema linkage, or enable unintended access patterns if resource_id is attacker-controlled.

Missing User Warnings

High

Confidence: 98% confidence
Finding: This code sends table schema, sample row data, retrieved context, prior errors, and the user's query to an external LLM. In an enterprise semantic-search/Text-to-SQL skill, those fields can contain sensitive business data, PII, secrets, or proprietary schema details, so transmitting them off-box without explicit disclosure, consent, redaction, or policy enforcement is a significant data exfiltration risk.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The reflection step makes a second external LLM call containing the generated SQL, user question, and schema context, which duplicates and broadens disclosure beyond the primary generation step. SQL text often embeds business logic, table/column names, literals, and filters that can reveal sensitive internal structure or user-supplied data, increasing exposure and retention risk.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal