Security audit

Multi Source Data Cleaner

Security checks across malware telemetry and agentic risk

Overview

This is a real data-cleaning skill, but its docs understate several ways sensitive data and tokens can be sent to external services.

Install only if you are comfortable managing external data flows yourself. For sensitive datasets, use local-only operation: do not set ai_model, avoid classify=True with an API key, avoid bitable_output unless you intend to upload cleaned records to Feishu, and disable report generation or Feishu document modules if reports should remain local. Treat DATA_CLEANER_API_KEY and Feishu tokens as sensitive, and do not rely on the README's local-only claim.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (14)

Tainted flow: 'req' from os.environ.get (line 224, credential/environment) → urllib.request.urlopen (network output)

Critical

Category: Data Flow
Content: }, data=b"{}", ) with urllib.request.urlopen(req, timeout=10) as resp: data = json.loads(resp.read().decode("utf-8")) if data.get("valid", False): result = {"valid": True, "tier": _prefix_to_tier(api_key)}
Confidence: 95% confidence
Finding: with urllib.request.urlopen(req, timeout=10) as resp:

Intent-Code Divergence

High

Confidence: 96% confidence
Finding: The README asserts that all data processing stays local and is never uploaded to third-party servers, yet the same document advertises AI features that require MiniMax or DeepSeek API keys. That creates a materially misleading privacy claim: users may submit sensitive business or personal data believing it remains local when AI-assisted field recognition or classification could transmit contents to external model providers.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The module description presents the tool as a local data cleaning utility, but the implementation also supports exporting cleaned data and reports to Feishu services. This mismatch can mislead users and integrators into processing sensitive datasets under the assumption that no external transmission occurs, increasing the risk of unintended data disclosure.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: This code can upload cleaned data to Feishu Bitable, which is an external publication/export channel not essential to basic cleansing. If the input contains personal, regulated, or confidential information, enabling this path can exfiltrate data to a third-party platform, especially since there is no explicit confirmation at the point of use.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: The report generation path creates a Feishu document from report content, causing derived data and metadata about the dataset to be uploaded externally. Even if the report is summarized, it may still reveal schema, data quality issues, counts, or sensitive field names, which can expose confidential information outside the local environment.

Missing User Warnings

High

Confidence: 94% confidence
Finding: The documentation describes AI capabilities and third-party API key configuration without adequately warning that using those features may transmit data to external model services. In a data-cleaning skill likely handling customer, CRM, finance, or operations datasets, this omission can lead to unintentional disclosure of sensitive or regulated information to third parties.

Missing User Warnings

High

Confidence: 93% confidence
Finding: The skill explicitly supports sending user data to Feishu and to external AI providers via API key, but the documentation does not clearly warn that uploaded/tabular data may be transmitted to third-party services. In a data-cleaning context, inputs commonly contain PII such as phone numbers, emails, addresses, IDs, and financial records, so lack of explicit disclosure and consent creates a real data exfiltration and privacy risk.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: When AI tagging is enabled, the code sends up to 20 rows of dataframe contents to an external API, which may include sensitive personal, financial, or business data. There is no explicit consent, redaction, classification gate, or user-facing disclosure, so confidential data can leave the local trust boundary unexpectedly.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: When ai_model is enabled, the code packages sample values from uncertain dataframe columns and sends them to third-party AI endpoints. Because supported fields include highly sensitive data such as names, phone numbers, ID cards, bank accounts, addresses, and emails, this can leak regulated or confidential data off-system without explicit consent, minimization, redaction, or clear disclosure.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The Feishu Bitable export call transmits data to an external service without an explicit warning or consent flow at the call site. In security-sensitive contexts, silent network egress of user data is dangerous because operators may not realize that supplying Feishu identifiers results in remote upload.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The Feishu document creation path uploads report markdown without an explicit warning at the point where the transmission occurs. Because reports may contain sensitive operational or data-quality information, this can create unintentional disclosure through a convenience feature that appears unrelated to external sharing.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The Bitable export path sends DataFrame contents to Feishu, an external service, but this module provides no explicit consent, warning, or data-classification checks before transmitting potentially sensitive cleaned data. In a data-cleaning/export skill, users may reasonably assume local export behavior unless remote upload is clearly disclosed, so this can cause unintended exfiltration of PII or business data.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The Feishu document export transmits the supplied report markdown to Feishu without any user-facing disclosure in this code, and reports may include sensitive summaries, samples, or identifiers derived from the dataset. Because this is an outbound transfer to a third party, missing transparency and consent controls creates a real privacy and data-governance risk even if the feature is intentionally designed.

Missing User Warnings

Medium

Confidence: 91% confidence
Finding: The module silently sends the API key to a remote verification service without any visible disclosure, warning, or consent mechanism in this code path. In an agent or skill context, this is more dangerous because users or deployers may assume environment-provided secrets remain local, while the skill exports them over the network during routine tier resolution.

VirusTotal

VirusTotal findings are pending for this skill version.

View on VirusTotal

Static analysis

No suspicious patterns detected.