video-subtitle-skill

Security checks across malware telemetry and agentic risk

Overview

This subtitle skill is mostly aligned with its purpose, but it needs review because it can expose the API key and send user media to SenseAudio without a clear consent step.

Review before installing. Do not use this on confidential, regulated, or third-party media unless you are allowed to send the audio to SenseAudio. Replace the API-key check with a redacted presence check, use a dedicated output folder for transcripts and subtitled videos, and require explicit confirmation before any remote upload.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • Taint TrackingDirect Taint Flow, Variable-Mediated Taint Flow, Credential Exfiltration Chain
  • MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
Findings (10)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
]

    print(f"  烧入字幕: {os.path.basename(video_path)} -> {os.path.basename(output_path)}")
    result = subprocess.run(cmd, capture_output=True, text=True)
    if result.returncode != 0:
        raise RuntimeError(f"ffmpeg 字幕烧入失败:\n{result.stderr}")
    return output_path
Confidence
81% confidence
Finding
result = subprocess.run(cmd, capture_output=True, text=True)

Tainted flow: 'files' from open (line 105, file read) → requests.post (network output)

High
Category
Data Flow
Content
for retry in range(3):
        try:
            resp = requests.post(ASR_API_URL, headers=headers, files=files,
                                 data=data_tuples, timeout=120)
            if resp.status_code == 429:
                wait = 10 * (retry + 1)
Confidence
97% confidence
Finding
resp = requests.post(ASR_API_URL, headers=headers, files=files, data=data_tuples, timeout=120)

Lp3

Medium
Category
MCP Least Privilege
Confidence
90% confidence
Finding
The skill documentation clearly indicates access to environment variables, shell execution, network calls to an external ASR service, and local file writes, but no permissions are declared. This creates a transparency and consent problem: users or hosts may not realize the skill uploads media externally and writes outputs locally, increasing the chance of unintended data exposure.

Tp4

High
Category
MCP Tool Poisoning
Confidence
87% confidence
Finding
The skill claims subtitle generation features, but its documented behavior includes uploading audio/video-derived content to an external cloud API, producing additional transcript/JSON artifacts, and supporting sentiment analysis. These undeclared behaviors materially change the privacy and data-processing risk, especially because transcripts, speaker data, and sentiment can reveal sensitive personal information beyond simple subtitle generation.

Description-Behavior Mismatch

Medium
Confidence
79% confidence
Finding
Sentiment analysis is a materially different form of inference from transcription and subtitles, and it can produce sensitive judgments about speakers that users may not expect. When such analysis is undocumented in the manifest, users cannot give informed consent, especially if the media contains private conversations or workplace meetings.

Vague Triggers

Medium
Confidence
76% confidence
Finding
Broad trigger phrases such as requests to summarize a video can cause the skill to activate unexpectedly on general media tasks. In this skill's context, accidental activation is more dangerous because it may lead to shell execution, local file generation, and upload of audio content to an external ASR provider without the user clearly intending to invoke this specific workflow.

Vague Triggers

Medium
Confidence
84% confidence
Finding
The usage guide tells users they can invoke the skill with broad, everyday natural-language requests such as asking Claude to add subtitles or summarize a video. In an agent environment, overly generic trigger phrasing can cause unintended activation on unrelated media tasks, increasing the chance of accidental processing of sensitive local files or external uploads.

Vague Triggers

Medium
Confidence
80% confidence
Finding
The example phrase around this location is framed like ordinary conversation rather than a tightly bounded command, which raises the risk that the assistant routes common requests into this skill without deliberate user intent. Because the skill can process local video/audio files and transcripts, accidental invocation could expose private content or incur external API usage.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The guide references obtaining a SenseAudio API key and states Claude will automatically run the workflow, but it does not clearly warn that user media and derived transcripts may be transmitted to a third-party ASR provider. This omission is significant because users may process meetings, interviews, or news videos containing sensitive personal, corporate, or copyrighted content without informed consent.

Missing User Warnings

Medium
Confidence
98% confidence
Finding
The tool's core workflow sends user-provided audio/video-derived content to an external ASR service, yet the script does not present a clear warning or consent gate before transmission. In a subtitle-generation context this is especially important because users may assume processing is local while highly sensitive spoken content is actually uploaded off-host.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal