Security audit

toncall-videourl-text

Security checks across malware telemetry and agentic risk

Overview

This skill does what it claims: it turns a user-provided video URL into text using local ffmpeg processing and Volcengine cloud services, with ordinary privacy cautions.

Install only if you are comfortable sending extracted audio from submitted videos to Volcengine services and keeping transcript text on disk. Use a dedicated least-privilege TOS bucket and API keys, keep config.ini private, install ffmpeg from a trusted source, and avoid private, regulated, copyrighted, or very large videos unless those data flows are acceptable.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Least PrivilegeUnderdeclared Capability, Wildcard Permission, Missing Permission Declaration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access

Findings (5)

Lp3

Medium

Category: MCP Least Privilege
Confidence: 91% confidence
Finding: The skill documentation describes capabilities that include network access, local file writes, and shell execution via ffmpeg, but it does not declare permissions explicitly. This creates a transparency and governance gap: operators and users cannot accurately assess the trust boundary or execution risk before enabling the skill. In this context, the risk is increased because the skill fetches arbitrary remote URLs, invokes external tooling, and writes outputs locally.

Tp4

High

Category: MCP Tool Poisoning
Confidence: 88% confidence
Finding: The documented behavior does not fully match the observed capabilities: the skill saves transcript files locally, may parse Douyin share pages to resolve actual media URLs, and may auto-process URLs without the user confirmation promised in the description. Behavior mismatches are security-relevant because they undermine informed consent, hide data flows, and can cause the agent to take broader actions than the user expects. The context makes this more dangerous because the skill handles untrusted URLs and uploads derived audio to a third-party cloud service.

Description-Behavior Mismatch

Low

Confidence: 89% confidence
Finding: The script permanently stores transcription text in the local `texts` directory while emphasizing cleanup of temporary local and cloud files. In a skill that handles arbitrary remote video URLs and user content, retaining transcripts can expose sensitive spoken content to later users, operators, or other processes on the host.

Missing User Warnings

Medium

Confidence: 96% confidence
Finding: The skill lacks a clear user-facing warning that downloaded video content and extracted audio are uploaded to Volcengine TOS and then processed by a third-party ASR service. This is a real privacy and data-handling issue because users may submit sensitive or copyrighted media without understanding that content leaves the local environment. In this skill's context, third-party upload is central to operation, so omission of that warning materially increases the risk of unintended disclosure.

Missing User Warnings

Medium

Confidence: 97% confidence
Finding: The skill uploads extracted audio from user-supplied video content to third-party cloud storage and a speech-recognition API without any explicit consent, warning, or opportunity to opt out. Because audio may contain personal, confidential, or regulated data, silent transfer to external services materially increases privacy and compliance risk in this skill context.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal

Static analysis

No suspicious patterns detected.