Back to skill

Security audit

data-synthesis

Security checks across malware telemetry and agentic risk

Overview

This skill performs the stated CSV-to-QA data synthesis workflow, with live LLM calls only when the user explicitly enables API mode.

Use dry-run first. Before setting DATA_SYNTHESIS_USE_API=1, confirm the CSV is approved for the configured LLM endpoint, especially if it contains personal, regulated, confidential, or proprietary text. Store the JSONL output securely because it may include original text chunks and non-empty source columns.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
  • Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Findings (3)

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The skill instructs users to send corpus chunks to an API-compatible LLM service but does not warn that source content may leave the local environment and be processed by another service. In a data-synthesis workflow, that content could contain proprietary, regulated, or sensitive text, making silent transmission a meaningful privacy and compliance risk.

Missing User Warnings

Low
Confidence
87% confidence
Finding
The skill writes generated QA pairs to JSONL and states that output includes source-derived content, but it does not clearly warn users that the resulting file may reproduce or expose portions of the original corpus. This can cause accidental persistence, sharing, or downstream ingestion of sensitive material extracted from the input data.

Ssd 3

Medium
Confidence
93% confidence
Finding
The pipeline sends raw CSV chunk contents to an LLM and also writes chunk text plus non-text source fields into the output JSONL. If the CSV contains personal, proprietary, regulated, or secret data, enabling live API mode will transmit that data to an external service and persist it in derived artifacts, creating a real data-exposure path.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.