tabular data processing and analysis

Security checks across malware telemetry and agentic risk

Overview

This is a disclosed table-processing skill whose LLM and file-output risks are real but aligned with its stated purpose.

Install only for datasets you are allowed to process with local files and, when LLM features are enabled, an OpenAI-compatible provider. For confidential or regulated spreadsheets, use --no-merge-header and --no-abstract, avoid generated reports that load CDN scripts, review output files for raw samples, and use a trusted API base URL with a revocable key.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (27)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 86% confidence
Finding: The skill declares required binaries and environment variables, but there is no explicit permissions declaration despite capabilities that can read secrets from the environment and write files. This weakens the trust boundary for agents and reviewers because the skill can handle credentials and produce artifacts without a clear, machine-readable permission contract.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 97% confidence
Finding: The skill documentation admits that LLM-related functions may send table headers, samples, schema, or other content to a remote OpenAI-compatible endpoint, creating a real data exfiltration path. At the same time, the advertised end-to-end analysis/reporting capability is only partially implemented and delegated to agent-authored dynamic Python, which can mislead operators about what code runs and what data leaves the workspace.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The skill is presented as a local CSV cleaning utility, but when header merging is enabled it silently obtains an LLM service and sends table data into `process_complex_header_table`. This creates a data-flow boundary crossing not disclosed by the interface, so users may expose sensitive spreadsheet contents to an external model provider unexpectedly.

Description-Behavior Mismatch

Medium

Confidence: 86% confidence
Finding: The skill conditionally sends table contents to an external LLM service for summary generation, which expands the data-processing boundary beyond local, controlled table preprocessing. If the CSV contains sensitive or regulated data, this can cause unintended data exfiltration to a third-party model provider, especially because abstract generation is enabled by default when an LLM is configured.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The documentation frames the skill as local CSV cleaning, but the described implementation can send table content to an external LLM for header detection. That mismatch is security-relevant because users may supply sensitive spreadsheet data under the assumption that processing remains local, leading to unanticipated data disclosure.

Intent-Code Divergence

Medium

Confidence: 91% confidence
Finding: The document claims only .csv files are supported, but the shown run() code does not enforce an extension or content-type check before loading the file. This inconsistency can cause the skill to process unexpected file types, increasing the risk of mis-parsing, denial of service from malformed inputs, or accidental handling of files outside the intended trust boundary.

Description-Behavior Mismatch

Medium

Confidence: 93% confidence
Finding: The documentation claims the skill only supports .csv files, but the shown run() logic checks only file existence before loading. This mismatch can let arbitrary existing files reach the parser, causing unintended file processing, confusing security assumptions, and potentially exposing non-CSV content through downstream description and output generation.

Description-Behavior Mismatch

Medium

Confidence: 97% confidence
Finding: The skill description says deep analysis occurs only under controlled conditions, yet the documented default behavior sends table-derived information to an external LLM whenever configured. For a table-processing skill, this creates an external data egress path without any visible gating, approval, redaction, or policy enforcement.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: Using OPENAI_API_KEY and calling an external LLM introduces network-based data disclosure capability into a feature presented as table description generation. Even if only schema is sent, column names, categories, and examples often contain sensitive business or personal data, so this materially expands the trust boundary.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The error-handling section documents a file-format rejection branch that is absent from the shown implementation. This discrepancy is security-relevant because operators may rely on the documented restriction, while the actual code appears to accept broader input and process it anyway.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The documentation presents the skill as a local CSV header-merging tool, but elsewhere describes sending the first 10 rows of table content to an LLM service for header detection. That is a real security and privacy issue because users may supply sensitive spreadsheet contents under the assumption that processing is local only, resulting in unintended external data disclosure.

Context-Inappropriate Capability

Medium

Confidence: 91% confidence
Finding: The skill includes fallback logic to use an OpenAI-backed LLM service with environment-based credentials, which expands the trust boundary beyond local file processing. This creates a genuine risk of unauthorized data egress to third-party services and may violate deployment expectations, compliance requirements, or data handling policies.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The template instructs loading Tailwind, React, ReactDOM, and Babel from external CDNs, introducing network-dependent execution and a third-party supply-chain trust boundary into a local reporting workflow. If those resources are unavailable, tampered with, or blocked, the generated report may fail or execute untrusted JavaScript in the user's browser.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: The cleaning routine can transmit table contents to an external LLM service to infer header rows, which creates a real data-exfiltration and privacy-boundary risk. In a module described as table cleaning/preprocessing, users may reasonably expect local-only transformations, so sending even the first 10 rows off-box can expose sensitive business or personal data without clear consent.

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The code can send table schema-derived content to an external LLM service via `llm_service.chat_async`, which creates a real data exfiltration path for potentially sensitive table contents. In a table-processing skill, this is risky because users may expect analysis to remain local, while column names, categories, examples, and inferred summaries can still reveal confidential business or personal data.

Context-Inappropriate Capability

Medium

Confidence: 88% confidence
Finding: The LLM-calling capability is not necessary for basic table description generation, yet it introduces outbound data flow from local tables to a remote service. Because this feature sits inside a description helper, it may be invoked implicitly by callers without appreciating that local data is leaving the execution boundary.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The code may transmit CSV content to an external LLM service without explicit notice, consent, or data classification checks. If users process confidential business, personal, or regulated data, this can cause unauthorized disclosure and compliance issues even though the feature appears to be simple preprocessing.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The skill writes files to disk, auto-creates directories, and derives output names from the input basename without documenting overwrite/collision risks. In agent or automated environments, this can unexpectedly overwrite existing CSVs in the target directory or place files in sensitive locations chosen by the caller, causing data loss or unintended file modification.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The documentation indicates that table content may be sent to an LLM for complex header handling, but it does not warn users about data egress, privacy implications, or external processing. In a table-cleaning context, inputs often contain business, personal, or regulated data, so silent transmission to a remote model materially raises confidentiality risk.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The documented default sends table schema/content-derived context to an LLM without an explicit warning that user data may leave the local environment. In a data-processing skill, omission of that warning undermines informed consent and increases the chance of accidental disclosure of regulated or confidential information.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill writes detailed row and column samples from the input table into a JSON file, but the documentation does not warn that potentially sensitive source data will be persisted to disk. This can create local data leakage through logs, backups, shared directories, or accidental artifact distribution.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The markdown describes constructing prompts from table rows and sending them to an LLM/OpenAI service without any explicit warning about data transmission, privacy, or retention implications. In a table-processing skill, this context makes the issue more serious because spreadsheet headers and top rows often contain personal, financial, or operationally sensitive information.

Missing User Warnings

Medium

Confidence: 80% confidence
Finding: The function sends system and user prompts to an external LLM provider but this file contains no consent, disclosure, or data-classification guardrails before transmission. In a table-processing skill, prompts may contain sensitive table contents, so silent external transmission can expose proprietary or personal data to a third party.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The code sends table schema information to an external LLM without any visible warning, privacy notice, or docstring disclosure that local data may be transmitted off-host. Lack of transparency increases the chance of accidental exposure of sensitive metadata and sample values, especially when this utility is reused by higher-level automation.

Ssd 3

Medium

Confidence: 97% confidence
Finding: The documented behavior collects file metadata, raw samples, tail rows, per-column examples, and may also transmit schema-derived content to an LLM. For CSVs containing personal, financial, or proprietary data, this creates multiple disclosure channels: on-disk artifacts, returned JSON strings, and outbound model requests.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal