Back to skill

Security audit

Tts Router

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local text-to-speech skill, but its voice-cloning workflow can pull voices from arbitrary online media without consent, privacy, or misuse safeguards.

Install only if you trust the external tts-router package and models, keep the server local, and use voice cloning only with explicit permission from the speaker. Avoid cloning voices from public videos, podcasts, or links unless you have rights to that audio, and clearly disclose generated speech as synthetic.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (6)

Description-Behavior Mismatch

Medium
Confidence
92% confidence
Finding
The documentation explicitly encourages fetching reference audio from arbitrary external streaming sites and processing it for voice cloning. In the context of a local TTS tool, this broadens the capability into remote content acquisition and impersonation workflows, which materially increases abuse potential, privacy exposure, and legal/compliance risk.

Context-Inappropriate Capability

Medium
Confidence
95% confidence
Finding
Bundling yt-dlp gives the skill a large-scale media extraction capability far beyond narrowly scoped local speech synthesis. That makes the router more dangerous because it streamlines downloading third-party audio for cloning, enabling impersonation and unauthorized use of voices at scale.

Vague Triggers

Medium
Confidence
87% confidence
Finding
The manifest description contains broad trigger phrases such as 'generate speech' and 'text to speech' that could cause the skill to activate for a wide range of ordinary user requests. Because this skill includes higher-risk capabilities like voice cloning and pulling models from external sources, overly broad routing increases the chance it is invoked without sufficient user intent or safety review.

Missing User Warnings

High
Confidence
96% confidence
Finding
The skill advertises voice cloning and 'speak in someone's voice' with no warning about consent, impersonation, privacy, or legality. In context, this is dangerous because the skill normalizes potentially abusive impersonation workflows and may lead an agent to assist with misuse involving third-party voices and personal audio.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The documented upload and URL-fetch endpoints for reference audio omit warnings about privacy, copyright, and data-transfer implications. This is risky because users may provide third-party audio or remote URLs without understanding that personal voice data could be ingested, stored, or processed locally and that remote fetching may touch external systems.

Missing User Warnings

Medium
Confidence
98% confidence
Finding
The guide provides end-to-end voice cloning instructions without any warning about consent, privacy, impersonation, or unauthorized use of third-party audio. For a cloning feature, omission of these safeguards materially increases the chance of harmful misuse and normalizes unsafe behavior.

VirusTotal

64/64 vendors flagged this skill as clean.

View on VirusTotal