Multi-Source Cleaner

Security checks across malware telemetry and agentic risk

Overview

The skill appears to be a real data-cleaning tool, but it under-discloses external transfers of dataset samples, reports, and credential-like values while claiming data stays local.

Review carefully before installing. Use it only with data you are comfortable sending to configured AI providers and Feishu, avoid placing third-party AI secrets in DATA_CLEANER_API_KEY until license and AI credentials are separated, and prefer local-only operation unless the publisher adds explicit disclosure, consent, redaction, and endpoint controls.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (19)

Tainted flow: 'req' from os.environ.get (line 367, credential/environment) → urllib.request.urlopen (network output)

Critical

Category: Data Flow
Content: }, method="POST", ) with urllib.request.urlopen(req, timeout=20) as resp: raw = json.loads(resp.read().decode("utf-8")) content = raw["choices"][0]["message"]["content"]
Confidence: 97% confidence
Finding: with urllib.request.urlopen(req, timeout=20) as resp:

Tainted flow: 'req' from os.environ.get (line 440, credential/environment) → urllib.request.urlopen (network output)

Critical

Category: Data Flow
Content: }, method="POST", ) with urllib.request.urlopen(req, timeout=15) as resp: raw = json.loads(resp.read().decode("utf-8")) content = raw["choices"][0]["message"]["content"]
Confidence: 97% confidence
Finding: with urllib.request.urlopen(req, timeout=15) as resp:

Intent-Code Divergence

High

Confidence: 97% confidence
Finding: The README states that all data processing stays local and is never uploaded to third-party servers, yet elsewhere it requires external AI API keys for AI-powered features. This creates a misleading privacy guarantee: users may enable AI classification/field identification without realizing their data could be sent to external model providers, risking unintended disclosure of sensitive business or personal data.

Context-Inappropriate Capability

High

Confidence: 99% confidence
Finding: The AI tagging path serializes the first 20 rows of the dataset into CSV and sends them to a third-party API. In a data-classification skill, those rows may contain PII, financial values, emails, or other sensitive business data, so undisclosed external transmission is a real confidentiality risk.

Intent-Code Divergence

Medium

Confidence: 89% confidence
Finding: The documentation frames AI as an aid for ambiguous fields, but the implementation sends raw sample values from uncertain columns to external APIs. Because the module is specifically designed to infer sensitive personal and financial field types, this mismatch can cause operators to unknowingly expose regulated or confidential data.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The function is documented as a local data-cleaning pipeline, but its behavior also includes optional uploads to Feishu Bitable and Feishu Docs. This mismatch is security-relevant because callers may trust the docstring and pass sensitive datasets without realizing the function can transmit data to external services when related flags are enabled.

Intent-Code Divergence

Medium

Confidence: 98% confidence
Finding: The code comments and docstring imply that failed verification should downgrade the user to FREE, but get_user_tier() falls back to the DATA_CLEANER_TIER environment variable after verification failure. Anyone who can influence the process environment can set DATA_CLEANER_TIER=pro and bypass license enforcement, enabling restricted features without a valid token.

Missing User Warnings

High

Confidence: 96% confidence
Finding: The README instructs users to configure third-party AI API keys for advanced features but does not adequately warn that enabling those features may send user data to external model services. In a data-cleansing workflow, uploaded content may contain customer records, financial data, or contact details, so lack of clear disclosure materially increases privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill explicitly supports writing cleaned data to Feishu Bitable and Feishu cloud documents, but the documentation does not clearly warn that user data will be transmitted to external services. Because this skill is designed to process potentially sensitive business and personal data such as phone numbers, addresses, IDs, and banking/order records, undisclosed outbound transfer creates a real privacy and compliance risk.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The documented AI features include automatic field recognition, semantic completion, and AI tagging, but there is no clear disclosure that records may be sent to third-party AI providers such as MiniMax or DeepSeek. In this context, the data being cleaned may contain PII and financial/customer data, so silent transmission to external AI services materially increases confidentiality, privacy, and regulatory exposure.

Missing User Warnings

High

Confidence: 98% confidence
Finding: The implementation sends sample dataset rows to an external API without any in-code warning, confirmation step, or policy check. That makes accidental data leakage likely, especially because the surrounding functionality suggests local dataframe processing rather than remote processing.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The AI fallback sends sample cell values and column names to external APIs without any explicit warning, consent flow, or disclosure. In a field-identification utility, those samples are likely to contain personally identifiable information and financial identifiers, making silent transmission materially risky.

Missing User Warnings

Medium

Confidence: 87% confidence
Finding: This code generates a report and then attempts to create a Feishu document from that report, which can send potentially sensitive dataset-derived content to an external service. In a data-cleaning skill, users may reasonably expect local processing, so silent or poorly disclosed outbound transfer increases data leakage risk.

Missing User Warnings

Medium

Confidence: 89% confidence
Finding: AI classification is performed on cleaned dataset contents when enabled, and the selected model may involve transmitting data to an external provider. Without a clear warning or consent step, sensitive records may be exposed to third-party AI services in ways users do not expect from a data-cleaning workflow.

Missing User Warnings

Medium

Confidence: 88% confidence
Finding: The code exports DataFrame contents to Feishu Bitable, which transmits potentially sensitive records to an external cloud service, but there is no explicit consent gate, warning, classification check, or redaction step before upload. In a data-cleaning/export skill, this is dangerous because the DataFrame may contain PII such as phone numbers, ID numbers, bank accounts, and addresses, so users could unintentionally exfiltrate regulated or confidential data off the local system.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The function creates Feishu cloud documents from report content and returns a public-facing document URL, but it does not warn that report data is being uploaded to a third-party service. Quality reports often summarize source data and may include sensitive findings, identifiers, or excerpts, so silent remote document creation can cause unintended disclosure.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The reporter includes raw sample field values in generated Markdown and dict outputs, which can expose personal, confidential, or otherwise sensitive data when reports are shared, logged, or returned via APIs. In a data-quality tool, this is especially risky because the sampled values are taken directly from arbitrary dataset columns without sensitivity filtering, masking, or user consent.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The module transmits the API key from the environment to an external validation service, but there is no visible user-facing disclosure or consent mechanism in this file. In agent/skill contexts, environment variables often contain sensitive credentials, so sending them off-box can violate expectations and create a credential-handling risk even if the endpoint is legitimate.

Ssd 3

Medium

Confidence: 98% confidence
Finding: The prompt includes raw sample rows from the user dataset, which are then transmitted to the external model provider. Prompt contents often end up in provider logs, telemetry, or retention systems, so embedding raw records directly creates a clear data-exposure channel.

VirusTotal

No VirusTotal findings

View on VirusTotal