UGC Manual

Security checks across malware telemetry and agentic risk

Overview

This skill performs the advertised lip-sync video workflow, but users should understand that selected face images and audio are sent to ComfyDeploy.

Install only if you are comfortable sending the chosen image and audio to ComfyDeploy. Use a limited ComfyDeploy API key if possible, avoid highly sensitive or non-consensual face and voice media, install ffmpeg from a trusted source, and prefer local trusted files or trusted URLs because remote audio is downloaded and decoded locally.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (7)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 85% confidence
Finding: The skill documentation exposes capabilities that imply shell execution, network access, and possible environment usage, but it declares no permissions or equivalent user-visible disclosure. That mismatch weakens trust boundaries and can cause an agent or user to invoke the skill without understanding that local commands and remote calls may occur.

Intent-Code Divergence

Medium

Confidence: 88% confidence
Finding: The manifest says the skill is for a user's own audio recording, but later sections broaden use to externally generated TTS and even arbitrary audio. This inconsistency can bypass routing or policy decisions that rely on the manifest, causing the skill to be used on inputs with different privacy, consent, or copyright risk than initially disclosed.

Description-Behavior Mismatch

Medium

Confidence: 84% confidence
Finding: The skill metadata frames inputs as user-provided files, but the code also accepts arbitrary remote image and audio URLs. That mismatch expands the trust boundary and enables server-side fetching of attacker-chosen resources, which can be abused for SSRF-like access, unexpected data retrieval, or processing untrusted remote media without clear user consent.

Context-Inappropriate Capability

Medium

Confidence: 90% confidence
Finding: The script downloads arbitrary remote audio URLs and then processes them locally with ffmpeg, adding a network-fetch capability beyond the stated purpose of using user-provided media. This increases exposure to SSRF, oversized download abuse, and malicious media parsing risks because attacker-controlled content is fetched and handed to a complex decoder.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The skill instructs users to provide images and audio that are sent to a third-party API, but it gives no user-facing warning that sensitive biometric-like face data and voice recordings leave the local environment. In this context, the omission is significant because the inputs are personally identifying media and may contain private or regulated content.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The skill uploads user image and audio content to a third-party service, but the provided skill description does not clearly warn about external transmission or data handling implications. Because the inputs may contain biometric voice data and personal imagery, undisclosed transfer to an external processor creates privacy and compliance risk even if the transmission is functionally necessary.

External Transmission

Medium

Category: Data Exfiltration
Content: ## API Details **Endpoint:** `https://api.comfydeploy.com/api/run/deployment/queue` **Deployment ID:** `075ce7d3-81a6-4e3e-ab0e-7a25edf601b5` ## Required Inputs
Confidence: 80% confidence
Finding: https://api.comfydeploy.com/

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal