Security audit

data-synthesis

Security checks across malware telemetry and agentic risk

Overview

This skill performs the stated CSV-to-QA data synthesis workflow, with live LLM calls only when the user explicitly enables API mode.

Use dry-run first. Before setting DATA_SYNTHESIS_USE_API=1, confirm the CSV is approved for the configured LLM endpoint, especially if it contains personal, regulated, confidential, or proprietary text. Store the JSONL output securely because it may include original text chunks and non-empty source columns.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep

Findings (3)

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill instructs users to send corpus chunks to an API-compatible LLM service but does not warn that source content may leave the local environment and be processed by another service. In a data-synthesis workflow, that content could contain proprietary, regulated, or sensitive text, making silent transmission a meaningful privacy and compliance risk.

Missing User Warnings

Low

Confidence: 87% confidence
Finding: The skill writes generated QA pairs to JSONL and states that output includes source-derived content, but it does not clearly warn users that the resulting file may reproduce or expose portions of the original corpus. This can cause accidental persistence, sharing, or downstream ingestion of sensitive material extracted from the input data.

Ssd 3

Medium

Confidence: 93% confidence
Finding: The pipeline sends raw CSV chunk contents to an LLM and also writes chunk text plus non-text source fields into the output JSONL. If the CSV contains personal, proprietary, regulated, or secret data, enabling live API mode will transmit that data to an external service and persist it in derived artifacts, creating a real data-exposure path.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.