video-subtitle-skill

Security checks across malware telemetry and agentic risk

Overview

This subtitle skill is mostly aligned with its purpose, but it needs review because it can expose the API key and send user media to SenseAudio without a clear consent step.

Review before installing. Do not use this on confidential, regulated, or third-party media unless you are allowed to send the audio to SenseAudio. Replace the API-key check with a redacted presence check, use a dedicated output folder for transcripts and subtitled videos, and require explicit confirmation before any remote upload.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
Behavioral ASTexec() Call, eval() Call, Dynamic Import
Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration

Findings (10)

subprocess module call

Medium

Category: Dangerous Code Execution
Content: ] print(f" 烧入字幕: {os.path.basename(video_path)} -> {os.path.basename(output_path)}") result = subprocess.run(cmd, capture_output=True, text=True) if result.returncode != 0: raise RuntimeError(f"ffmpeg 字幕烧入失败:\n{result.stderr}") return output_path
Confidence: 81% confidence
Finding: result = subprocess.run(cmd, capture_output=True, text=True)

Tainted flow: 'files' from open (line 105, file read) → requests.post (network output)

High

Category: Data Flow
Content: for retry in range(3): try: resp = requests.post(ASR_API_URL, headers=headers, files=files, data=data_tuples, timeout=120) if resp.status_code == 429: wait = 10 * (retry + 1)
Confidence: 97% confidence
Finding: resp = requests.post(ASR_API_URL, headers=headers, files=files, data=data_tuples, timeout=120)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 90% confidence
Finding: The skill documentation clearly indicates access to environment variables, shell execution, network calls to an external ASR service, and local file writes, but no permissions are declared. This creates a transparency and consent problem: users or hosts may not realize the skill uploads media externally and writes outputs locally, increasing the chance of unintended data exposure.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 87% confidence
Finding: The skill claims subtitle generation features, but its documented behavior includes uploading audio/video-derived content to an external cloud API, producing additional transcript/JSON artifacts, and supporting sentiment analysis. These undeclared behaviors materially change the privacy and data-processing risk, especially because transcripts, speaker data, and sentiment can reveal sensitive personal information beyond simple subtitle generation.

Description-Behavior Mismatch

Medium

Confidence: 79% confidence
Finding: Sentiment analysis is a materially different form of inference from transcription and subtitles, and it can produce sensitive judgments about speakers that users may not expect. When such analysis is undocumented in the manifest, users cannot give informed consent, especially if the media contains private conversations or workplace meetings.

Vague Triggers

Medium

Confidence: 76% confidence
Finding: Broad trigger phrases such as requests to summarize a video can cause the skill to activate unexpectedly on general media tasks. In this skill's context, accidental activation is more dangerous because it may lead to shell execution, local file generation, and upload of audio content to an external ASR provider without the user clearly intending to invoke this specific workflow.

Vague Triggers

Medium

Confidence: 84% confidence
Finding: The usage guide tells users they can invoke the skill with broad, everyday natural-language requests such as asking Claude to add subtitles or summarize a video. In an agent environment, overly generic trigger phrasing can cause unintended activation on unrelated media tasks, increasing the chance of accidental processing of sensitive local files or external uploads.

Vague Triggers

Medium

Confidence: 80% confidence
Finding: The example phrase around this location is framed like ordinary conversation rather than a tightly bounded command, which raises the risk that the assistant routes common requests into this skill without deliberate user intent. Because the skill can process local video/audio files and transcripts, accidental invocation could expose private content or incur external API usage.

Missing User Warnings

Medium

Confidence: 94% confidence
Finding: The guide references obtaining a SenseAudio API key and states Claude will automatically run the workflow, but it does not clearly warn that user media and derived transcripts may be transmitted to a third-party ASR provider. This omission is significant because users may process meetings, interviews, or news videos containing sensitive personal, corporate, or copyrighted content without informed consent.

Missing User Warnings

Medium

Confidence: 98% confidence
Finding: The tool's core workflow sends user-provided audio/video-derived content to an external ASR service, yet the script does not present a clear warning or consent gate before transmission. In a subtitle-generation context this is especially important because users may assume processing is local while highly sensitive spoken content is actually uploaded off-host.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal