HeteroMind - Unified Knowledge QA

Security checks across malware telemetry and agentic risk

Overview

This appears to be a real multi-source QA skill, but it can run AI-generated code and database queries with weak enforced safeguards.

Install only if you are prepared to review and constrain it. Use isolated environments, read-only database credentials, explicit table paths, trusted SPARQL endpoints, and local or approved LLM providers for sensitive data. Treat generated Python/SQL/SPARQL as untrusted and require human approval or a real sandbox before live execution.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (27)

exec() call detected

High
Category
Dangerous Code Execution
Content
import pandas as pd
            scope = {"df": df, "pd": pd}
            
            exec(code, scope)
            result = scope.get("result")
            
            return TableQAStep(
Confidence
99% confidence
Finding
exec(code, scope)

Intent-Code Divergence

Medium
Confidence
93% confidence
Finding
The document claims outputs are sanitized, but later recommends logging full generated queries and execution details. In a system that generates SQL, SPARQL, and code from user input, those logs can contain secrets, sensitive query contents, schema details, or data-derived values, so the stated control is incomplete and misleading.

Intent-Code Divergence

Medium
Confidence
98% confidence
Finding
The sandboxing section presents a 'restricted scope' as protection, but calling exec on LLM-generated code is still dangerous because Python code can often recover powerful builtins or abuse exposed objects. Given this skill explicitly generates pandas analysis code from natural language, prompt- or data-driven code injection could lead to arbitrary code execution or data access.

Intent-Code Divergence

Medium
Confidence
96% confidence
Finding
This is a real security issue because the validator only appends warnings for dangerous statements like DROP, DELETE, and TRUNCATE, and for non-SELECT queries, but still leaves validation['valid'] as true. In an NL2SQL system that generates SQL from natural-language input, this can allow destructive or modifying SQL to proceed if downstream code treats 'valid' as authorization to execute the query.

Intent-Code Divergence

High
Confidence
97% confidence
Finding
The documentation claims 'Safe execution' even though the implementation later performs unsandboxed exec of generated Python. This mismatch is dangerous because it can cause operators and downstream integrators to trust a component that actually enables arbitrary code execution.

Intent-Code Divergence

High
Confidence
97% confidence
Finding
The class-level documentation repeats that the pipeline includes 'Safe execution', but the actual code-execution stage is unsandboxed. This misleading safety claim increases the chance of insecure deployment and lowers operator caution around a highly dangerous feature.

Intent-Code Divergence

Medium
Confidence
93% confidence
Finding
The context manager claims to clear the API key from memory on exit, but it only sets `self.api_key = None` while retaining the same sensitive value in `self.original_key`. This creates a misleading security guarantee and can allow the secret to remain accessible through the object lifetime, increasing the risk of accidental disclosure via debugging, serialization, or memory inspection.

Missing User Warnings

Medium
Confidence
85% confidence
Finding
The README promotes automatic routing, query generation, and execution across databases, SPARQL endpoints, table files, and external LLM providers, but does not prominently warn that user queries, schemas, or data may be transmitted to third-party services or that generated queries/code may execute against live systems. In a skill that explicitly performs autonomous execution, missing disclosure and consent boundaries can lead to unintended data exposure or side effects.

Missing User Warnings

Low
Confidence
81% confidence
Finding
The setup instructions tell users to print the API key to the terminal for verification. While this is not a software exploit by itself, it increases risk of accidental credential exposure via shoulder-surfing, terminal recording, shared sessions, screenshots, or shell logging practices.

Vague Triggers

Medium
Confidence
85% confidence
Finding
The trigger language is very broad and includes common phrases like "how many" and "show," which can cause the skill to activate for ordinary conversation. In a skill with network, database, and file capabilities, unintended invocation can lead to unnecessary data access or transmission without clear user intent.

Vague Triggers

Medium
Confidence
83% confidence
Finding
The when-to-use table uses broad categories such as aggregations, filters, joins, and structured information requests without clear boundaries. That ambiguity raises the chance that the skill will be selected in contexts where the user did not intend database/KG/table access, increasing exposure of connected systems and data.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill description does not warn that user queries and possibly schema/entity context may be sent to external LLM providers and remote knowledge endpoints. This is a real privacy and compliance concern because natural-language questions can contain sensitive business or personal data that would leave the local environment.

Missing User Warnings

Low
Confidence
88% confidence
Finding
The setup guidance tells users to copy, edit, and export secrets from a .env file but does not warn about secure handling of API keys and database connection strings. Poor secret-handling guidance can lead to accidental shell history leakage, process-list exposure, repository commits, or insecure file permissions.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The documentation repeatedly instructs users to send natural-language queries and likely database schema/ontology context to third-party cloud LLM endpoints, but it does not warn that sensitive prompts, schema details, or table contents may leave the local environment. In a heterogeneous QA system, those inputs can contain confidential business data, making the omission a real security/privacy issue even though it appears unintentional.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The test suite initializes multiple LLM-backed engines with a remote DeepSeek endpoint and passes schema, ontology, table paths, and natural-language queries into those engines. In a heterogeneous QA skill, this can disclose internal database structure or table context to a third-party service without an explicit consent gate or clear data-handling warning, which is a real privacy/data-governance risk even in test code.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The code interpolates user-derived entity text directly into a SPARQL query string. Even though the current implementation does not execute the query yet, this is a real vulnerability pattern: once wired to a live endpoint, crafted input can alter query structure or force unintended outbound lookups, and the user text is also transmitted to external SPARQL services without any disclosure or privacy control. In this skill context, the risk is more credible because the component is explicitly designed to route natural-language queries to heterogeneous backends automatically.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The code sends the raw user query to an external LLM provider via `client.chat.completions.create(...)` without any visible consent, minimization, or disclosure at the point of transmission. In a QA system, queries can easily contain sensitive business data, personal data, or confidential file contents, so this creates a real privacy and data-governance risk even if it is not a code-execution flaw.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The orchestrator logs the full user query at info level, which can expose sensitive natural-language input such as personal data, credentials, proprietary business questions, or regulated data to application logs. In a knowledge QA system, user queries are a primary input channel and are likely to contain exactly the kinds of sensitive information that should be minimized or redacted before logging.

Missing User Warnings

Medium
Confidence
92% confidence
Finding
The code logs the full natural-language query verbatim with `logger.info(f"Decomposing query: {query}")`. In a heterogeneous QA system, user queries can easily contain sensitive business data, PII, credentials, or confidential investigative prompts, and application logs are often broadly retained, replicated, and accessible to operators or downstream tooling.

Missing User Warnings

Medium
Confidence
90% confidence
Finding
The engine sends the user's natural-language query and ontology/context to an external LLM provider during entity linking without any consent gate, disclosure, or data-classification check in this code path. In a QA system, queries can contain sensitive business data, personal data, or internal identifiers, so silent transmission to a third party can create privacy, compliance, and confidentiality risks.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
During SPARQL generation, the code sends the user query plus linked entity URIs and ontology details to an external LLM service. That expands the disclosure surface beyond the raw query to include structured internal knowledge-graph metadata, which may reveal schema, identifiers, and sensitive relationships to a third party without explicit notice or approval.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The engine performs outbound HTTP requests to a configured SPARQL endpoint using generated queries, with no evident allowlist, network policy restriction, or user-facing disclosure in this file. In this skill context, LLM-generated SPARQL can cause unintended data exfiltration to remote endpoints, query sensitive graph data, or be used for SSRF-like access to internal services if the endpoint is attacker-controlled or insufficiently constrained.

Missing User Warnings

Medium
Confidence
82% confidence
Finding
The engine sends the user query and formatted database schema to an external LLM provider during schema linking. In an NL2SQL system, schema metadata can reveal sensitive internal structure, table names, business entities, or regulated data domains, and this file shows no consent gate, minimization, or provider-scope restriction before transmission.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The engine executes LLM-generated SQL directly against a live database connection with sqlite3 after only lightweight validation of candidate strings. Because the model can generate destructive or overly broad statements, this creates a real risk of unauthorized reads, data modification, schema damage, or operational disruption, especially if the connection points to production data.

Missing User Warnings

High
Confidence
95% confidence
Finding
The system executes LLM-generated Python without any user-facing warning or disclosure, which hides a major trust boundary from operators and end users. In this TableQA context, natural-language questions can indirectly drive code generation, making silent execution especially risky because users may think they are performing passive analytics when they are actually triggering arbitrary code paths.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal