Tomoviee Video Background Music

Security checks across malware telemetry and agentic risk

Overview

This skill is a straightforward Tomoviee video-soundtrack API helper, with normal external media-processing and credential-handling risks but no evidence of hidden, destructive, or deceptive behavior.

Install only if you are comfortable sending video URLs and prompts to Tomoviee/Wondershare for processing. Use public or non-sensitive media where possible, avoid exposing generated Basic auth tokens in logs or shared shells, and treat the broad reference docs as background rather than permission for unrelated APIs.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection

Findings (9)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 90% confidence
Finding: The skill documentation describes use of an external API client, video URLs, and polling workflow, which clearly implies network access, yet no permissions are declared. This creates a transparency and governance gap: users or hosting platforms may not realize the skill sends data to third-party services, including user-provided video URLs and prompts.

Description-Behavior Mismatch

Medium

Confidence: 95% confidence
Finding: The reference file documents multiple capabilities—text-to-music, sound effects, and text-to-speech—despite the skill being declared as a video-scoring skill. This scope expansion increases the chance the agent may invoke broader audio-generation features than users or reviewers expect, weakening least-privilege boundaries and enabling misuse beyond the advertised purpose.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: Documenting text-to-speech in a video-scoring skill introduces an unrelated content-generation pathway that could be used to synthesize spoken content without the user's expectation or explicit consent. In an agentic setting, mismatched capability docs can lead to unauthorized feature use, data exfiltration to external TTS services, or policy bypass through undeclared modalities.

Context-Inappropriate Capability

Medium

Confidence: 92% confidence
Finding: Sound-effect generation is unrelated to the declared video-soundtrack purpose and broadens the operational surface of the skill. This can cause the agent to perform undeclared actions or process user inputs in ways not covered by the skill's stated scope, undermining trust and permission boundaries.

Context-Inappropriate Capability

Low

Confidence: 86% confidence
Finding: General text-to-music generation exceeds the narrow declared purpose of generating soundtracks tailored to video content. While less severe than TTS, it still signals capability drift and may allow the skill to generate standalone music outside the user-approved workflow.

Missing User Warnings

Medium

Confidence: 90% confidence
Finding: The documentation instructs use of callback URLs and publicly accessible video URLs but provides no warning that user media, metadata, and task results are sent to an external service and potentially to third-party callback endpoints. This creates privacy and data-handling risk, especially if users provide sensitive or non-public video content.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The video soundtrack feature explicitly analyzes video scenes, pacing, and mood, which means user-provided media content is processed by an external system without any accompanying privacy or data-use notice. For a media-analysis workflow, omission of disclosure is risky because videos may contain personal, confidential, or copyrighted content.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The script prints a credential-derived Basic auth token directly to stdout, which can expose secrets through terminal scrollback, shell session recording, CI logs, or copied command output. In this context the token is just base64-encoded app_key:app_secret, so disclosure effectively reveals the underlying credentials to anyone who can access the output.

Credential Access

High

Category: Privilege Escalation
Content: app_secret = sys.argv[2] token = generate_access_token(app_key, app_secret) print(f"Access Token: {token}") print(f"\nUse in Authorization header as: Basic {token}")
Confidence: 90% confidence
Finding: Access Token

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal