Security audit

Tts Router

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local text-to-speech skill, but its voice-cloning workflow can pull voices from arbitrary online media without consent, privacy, or misuse safeguards.

Install only if you trust the external tts-router package and models, keep the server local, and use voice cloning only with explicit permission from the speaker. Avoid cloning voices from public videos, podcasts, or links unless you have rights to that audio, and clearly disclose generated speech as synthetic.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands

Findings (6)

Description-Behavior Mismatch

Medium

Confidence: 92% confidence
Finding: The documentation explicitly encourages fetching reference audio from arbitrary external streaming sites and processing it for voice cloning. In the context of a local TTS tool, this broadens the capability into remote content acquisition and impersonation workflows, which materially increases abuse potential, privacy exposure, and legal/compliance risk.

Context-Inappropriate Capability

Medium

Confidence: 95% confidence
Finding: Bundling yt-dlp gives the skill a large-scale media extraction capability far beyond narrowly scoped local speech synthesis. That makes the router more dangerous because it streamlines downloading third-party audio for cloning, enabling impersonation and unauthorized use of voices at scale.

Vague Triggers

Medium

Confidence: 87% confidence
Finding: The manifest description contains broad trigger phrases such as 'generate speech' and 'text to speech' that could cause the skill to activate for a wide range of ordinary user requests. Because this skill includes higher-risk capabilities like voice cloning and pulling models from external sources, overly broad routing increases the chance it is invoked without sufficient user intent or safety review.

Missing User Warnings

High

Confidence: 96% confidence
Finding: The skill advertises voice cloning and 'speak in someone's voice' with no warning about consent, impersonation, privacy, or legality. In context, this is dangerous because the skill normalizes potentially abusive impersonation workflows and may lead an agent to assist with misuse involving third-party voices and personal audio.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The documented upload and URL-fetch endpoints for reference audio omit warnings about privacy, copyright, and data-transfer implications. This is risky because users may provide third-party audio or remote URLs without understanding that personal voice data could be ingested, stored, or processed locally and that remote fetching may touch external systems.

Missing User Warnings

Medium

Confidence: 98% confidence
Finding: The guide provides end-to-end voice cloning instructions without any warning about consent, privacy, impersonation, or unauthorized use of third-party audio. For a cloning feature, omission of these safeguards materially increases the chance of harmful misuse and normalizes unsafe behavior.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal