Bug Data Generator

Security checks across malware telemetry and agentic risk

Overview

This skill is a local training-data helper with some documentation ambiguity, but its file writes and JSONL generation are disclosed, purpose-aligned, and not hidden or exfiltrating.

Install only if you intend to generate smart-home bug-fix training data. Review generated instruction lists before producing JSONL, verify any companion data-generator skill is trusted, choose output paths deliberately, and use synthetic or anonymized device and scene names unless real household data is explicitly approved for training use.

SkillSpector

By NVIDIA

Vulnerability Patterns

Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (6)

Tp4

High

Category: MCP Tool Poisoning
Confidence: 95% confidence
Finding: The skill metadata says it only produces an intermediate artifact for user review, but the body also instructs the agent to invoke data-generator and describes producing full JSONL training records. That mismatch is dangerous because downstream systems or users may grant it broader trust than intended, leading to unreviewed generation, unexpected data handling, and possible persistence of sensitive context.

Intent-Code Divergence

Medium

Confidence: 90% confidence
Finding: The documentation first frames the skill as an intermediate planning step, then later directs automatic invocation of data-generator. This inconsistency can cause agents to bypass user review and execute the next stage automatically, increasing the chance of unintended processing of user instructions and contextual data.

Description-Behavior Mismatch

High

Confidence: 97% confidence
Finding: The skill metadata says this component should generate tool names and candidate user instructions for user review before any full training data generation occurs, but the code directly synthesizes complete JSONL conversation records and writes them to disk. This bypasses the promised approval gate and can produce unreviewed or policy-violating training data, which is especially risky because the output encodes tool calls and assistant behavior that may later be used for model training.

Vague Triggers

Medium

Confidence: 76% confidence
Finding: The trigger scenarios are broad and loosely phrased, which can cause the skill to activate in ambiguous contexts and perform data-generation-related actions when the user did not clearly intend them. In agent systems, overbroad activation increases the risk of accidental tool use and unintended exposure of context to subsequent workflow steps.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill says it may automatically call data-generator and include user environment context such as device lists, scene lists, timestamps, and instructions, but it does not provide a clear privacy notice or data-impact warning. This is dangerous because sensitive household or environment metadata may be transformed into training artifacts or logs without informed consent or minimization.

Ssd 3

Medium

Confidence: 96% confidence
Finding: The skill explicitly instructs inclusion of full user context and device/scenario data in generated outputs, including local device names, time, scene lists, and device lists. This creates a significant data exposure risk because the generated training samples may contain sensitive environmental metadata that could be stored, reviewed, or reused beyond the user's immediate request.

VirusTotal

57/57 vendors flagged this skill as clean.

View on VirusTotal