aim-digital-human-video

Security checks across malware telemetry and agentic risk

Overview

The skill does what it claims, but it handles sensitive face/voice media, public video links, and an API secret in ways users should review before installing.

Review before installing. Use only images, audio, and voice samples you have rights and consent to upload, and treat returned video URLs as public or shareable. Prefer configuring the aim-secret-key through a secure secrets mechanism or out-of-band setup instead of pasting it into chat, use a scoped/revocable key, and delete .env and .task-history.jsonl when no longer needed.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (6)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 84% confidence
Finding: The skill declares no permissions even though its instructions clearly require file read/write, network access, shell execution, and secret handling. This mismatch prevents meaningful user review and consent, and increases the chance the agent will perform sensitive actions without transparent authorization boundaries.

Missing User Warnings

Medium

Confidence: 92% confidence
Finding: The README explicitly states that the output is a publicly accessible TOS video URL and even encourages direct access or sharing, but it does not warn users that generated content may be exposed to anyone with the link. In this skill's context, users are uploading face images and voice/audio to create talking-head videos, so undocumented public exposure creates meaningful privacy and consent risk for sensitive biometric-like media.

Vague Triggers

Medium

Confidence: 70% confidence
Finding: The trigger phrase is broad enough to match ordinary user requests about making a digital-human video, which can cause the skill to activate without the user understanding that local files will be read and media will be sent to an external service. Overbroad triggering is risky here because the workflow includes uploading user content and generating a public result URL.

Missing User Warnings

High

Confidence: 94% confidence
Finding: The skill describes uploading image/audio data to an external AEP service and returning a public TOS URL, but does not present this as a clear user-facing warning or require explicit consent. This creates a substantial privacy risk because sensitive media may be transmitted off-platform and exposed via a publicly accessible link.

Missing User Warnings

High

Confidence: 97% confidence
Finding: The skill instructs the agent to solicit the user's API secret in chat and persist it into a local .env file without a strong safety warning or secure secret-entry mechanism. Asking users to paste secrets into conversation creates exposure in logs, transcripts, and agent memory, and persisting them locally further expands the attack surface.

Ssd 3

High

Confidence: 99% confidence
Finding: The skill explicitly tells the agent to collect a secret from the user via chat and write it into a local .env file. This is dangerous because secrets pasted into chat may be retained in logs or transcripts, and storing them in a workspace file can expose them to other tools, future runs, accidental sharing, or misconfigured access controls.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal