tts

Security checks across malware telemetry and agentic risk

Overview

This text-to-speech skill appears purpose-aligned, but it may send user text to an online Microsoft Edge TTS service without clear notice or consent and may alter the text before synthesis.

Review this before installing if you may synthesize private, confidential, regulated, or credential-like text. Treat input text as potentially sent to an external online TTS service, and verify that the skill asks for clear intent before conversion and does not silently rewrite the text you want spoken.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
  • Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Findings (6)

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The code silently modifies user-provided input by removing words like 'tts' and 'text-to-speech' before synthesis, which breaks integrity of user content and can produce output the user did not request. In a TTS skill, fidelity to the supplied text is core behavior, so undocumented content rewriting is a real security/trust issue even if it does not directly enable code execution.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The documentation explicitly states that an internet connection is required for the TTS service, but it does not warn users that text submitted for synthesis may be transmitted to a third-party service. In a TTS skill, users may paste sensitive prompts, personal data, or confidential content, so omitting this disclosure can lead to unintended data exposure and compliance/privacy issues.

Vague Triggers

Medium
Confidence
81% confidence
Finding
The skill advertises activation on broad cues such as user requests for audio/voice output and the generic "tts" trigger, which can cause the skill to engage in cases where the user did not explicitly intend speech generation. In an agent environment, ambiguous routing can lead to unintended transmission of sensitive content to an external online TTS service, creating confidentiality and consent risks.

Vague Triggers

Medium
Confidence
86% confidence
Finding
The trigger detection guidance is vague and relies on generic TTS intent recognition without hard boundaries, increasing the chance of false activation. Because this skill uses Microsoft's online TTS service, accidental invocation could expose private conversation text or generated summaries to a third party when the user only meant to discuss TTS conceptually rather than synthesize speech.

Missing User Warnings

Medium
Confidence
91% confidence
Finding
The guide explicitly states the library uses Microsoft Edge's online TTS service and shows sending arbitrary user text to that service, but it does not warn that prompts or sensitive content will leave the local environment. In an agent skill context, users may provide private, regulated, or proprietary text for speech generation, so omission of a data-transmission/privacy warning can lead to unintended disclosure to a third-party service.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The skill sends arbitrary text to Microsoft Edge's online TTS service via the node-edge-tts library without any explicit notice, consent flow, or privacy warning. Users may provide secrets, personal data, or regulated content expecting local processing, so silent transmission to a third-party network service creates a real confidentiality and compliance risk.

VirusTotal

65/65 vendors flagged this skill as clean.

View on VirusTotal