yolo-vision-tools

Security checks across malware telemetry and agentic risk

Overview

This is a coherent YOLO computer-vision toolkit, with expected local setup, model download, diagnostics, training, and output-file behavior disclosed in the artifacts.

Install this in a virtual environment, expect model/package downloads unless using preloaded local models, and avoid using private images, videos, webcam feeds, or datasets unless you are comfortable with local copies and generated outputs being saved. Treat environment reports as sensitive and redact paths, hardware details, and package information before sharing. Only run cache cleanup or dataset upload options when explicitly intended.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (17)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: def run_command(cmd): """Run command and return output""" try: result = subprocess.run(cmd, shell=True, capture_output=True, text=True, check=True) return result.stdout.strip() except subprocess.CalledProcessError as e: return f"Error: {e.stderr.strip()}"
Confidence: 96% confidence
Finding: result = subprocess.run(cmd, shell=True, capture_output=True, text=True, check=True)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: try: # Check if Python exists and can run cmd = f'"{path}" -c "import sys; print(sys.version.split()[0])"' result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=2) if result.returncode == 0: version = result.stdout.strip()
Confidence: 91% confidence
Finding: result = subprocess.run(cmd, shell=True, capture_output=True, text=True, timeout=2)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: version = result.stdout.strip() # Try to import ultralytics in this environment ultralytics_check = subprocess.run( f'"{path}" -c "import ultralytics; print(ultralytics.__version__)"', shell=True, capture_output=True, text=True, timeout=2 )
Confidence: 91% confidence
Finding: ultralytics_check = subprocess.run( f'"{path}" -c "import ultralytics; print(ultralytics.__version__)"', shell=True, capture_output=True, text=T

Lp3

Medium

Category: MCP Least Privilege
Confidence: 93% confidence
Finding: The skill advertises and documents capabilities that imply shell execution, network access, and file read/write behavior, but no explicit permissions are declared. This creates a transparency and control gap: an agent or reviewer may assume the skill is lower-risk than it really is, while the documented workflows include package installation, remote model fetching, and filesystem output handling.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 91% confidence
Finding: The stated purpose is image/video vision analysis, but the skill documentation substantially expands into environment discovery, package setup, dataset conversion, training, model export, cache management, and other operational tasks. That scope expansion increases attack surface and can lead to unintended execution paths such as system inspection, broad file manipulation, or network activity that users did not reasonably expect from a vision-analysis skill.

Description-Behavior Mismatch

Medium

Confidence: 84% confidence
Finding: Training custom YOLO models is materially different from simple vision inference: it can consume large datasets, perform extensive file operations, and trigger prolonged compute usage. When a skill framed as analysis also includes training guidance and helpers, users may unintentionally authorize broader data handling and resource-intensive actions than expected.

Description-Behavior Mismatch

Medium

Confidence: 94% confidence
Finding: This module includes functionality to enumerate and delete files from the user's Ultralytics cache directory, which exceeds the stated purpose of computer vision/model utility support. In an agent skill context, this creates filesystem side effects and potential loss of user data or downloaded models if invoked without clear user intent or scope restriction.

Context-Inappropriate Capability

Medium

Confidence: 93% confidence
Finding: The clear_model_cache function performs destructive file deletion that is not required for model selection, loading, or inference. In a skill advertised for YOLO vision tasks, hidden maintenance capabilities increase risk because an agent may invoke them unexpectedly and delete cached assets that users rely on.

Vague Triggers

Medium

Confidence: 80% confidence
Finding: Broad activation phrases like asking what is in a picture can cause the skill to trigger on ordinary conversational image questions, even when the user did not intend to invoke a tool with file, network, or shell side effects. Over-triggering matters more here because the skill also documents saving outputs, copying inputs, and setup workflows beyond pure interpretation.

Vague Triggers

Medium

Confidence: 82% confidence
Finding: Generic phrases such as 'AI image analysis' or 'computer vision analysis' are too vague and may capture unrelated requests. In a skill with side-effecting capabilities and persistent output behavior, ambiguous activation can lead to unnecessary processing, file retention, or external downloads without clear user intent.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The documentation states that input files may be copied and outputs/logs/reports saved automatically, but it does not clearly warn users about retention, storage location, duration, or sensitivity of uploaded media. This creates a real privacy and data-governance risk, especially for images and videos that may contain faces, locations, documents, or other sensitive content.

Missing User Warnings

Low

Confidence: 93% confidence
Finding: The guide recommends commands that fetch a remote image/model from the internet but does not explicitly warn users that these actions cause outbound network access and remote content download. In a documentation context this is not inherently malicious, but it can surprise users in restricted or privacy-sensitive environments and may violate no-network expectations.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The environment report guidance prints detailed system metadata including Python version, platform, package versions, CUDA/GPU details, and potentially host characteristics, then encourages sharing it for troubleshooting. While useful for support, this can leak sensitive fingerprinting data or internal environment details if shared externally without a privacy warning.

Missing User Warnings

Low

Confidence: 94% confidence
Finding: The connectivity test section instructs users to contact external endpoints with curl but does not disclose that doing so transmits network metadata such as IP address, headers, and timing information. This is a documentation-level privacy issue rather than code execution risk, but it still matters in controlled or monitored environments.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The documentation exposes a parameter (`upload_dataset`) that can transmit datasets to a third-party service without any accompanying privacy, consent, or data-handling warning. In a computer vision skill, datasets commonly contain sensitive images, personally identifiable information, or proprietary data, so omission of a warning increases the chance of unintentional data exfiltration by users.

Missing User Warnings

Low

Confidence: 87% confidence
Finding: The document explicitly advertises an optional `download` field that can trigger automatic retrieval of external dataset content, but it provides no warning that using it will initiate network access and import untrusted data into the workflow. In an agent skill context, undocumented network behavior and external data ingestion can surprise users, weaken trust boundaries, and increase supply-chain or data provenance risk even if the content is educational rather than overtly malicious.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The deletion routine removes .pt files based solely on age without any confirmation prompt, safety interlock, or warning to the caller. That makes accidental invocation capable of silently deleting locally cached models, disrupting workflows and potentially forcing re-downloads or breaking offline use.

VirusTotal

53/53 vendors flagged this skill as clean.

View on VirusTotal