Text to Voice Local

Security checks across malware telemetry and agentic risk

Overview

This is a coherent text-to-MP3 skill, but it presents itself as local while using a TTS backend that may send the user’s text to an external service without clear notice.

Install only if you are comfortable using node-edge-tts and with the possibility that text sent for voice generation leaves the local workspace. Avoid processing secrets, private messages, customer data, or proprietary documents unless the skill is updated to clearly disclose the provider/data flow or uses a verifiably offline TTS engine.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Excessive AgencyUnrestricted Tool Access, Autonomous Decision Making, Scope Creep
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (7)

Description-Behavior Mismatch

Medium
Confidence
94% confidence
Finding
The skill is advertised as providing local text-to-voice generation, but the implementation uses EdgeTTS, which commonly sends text to a remote service for synthesis. This creates a security and privacy mismatch: users may provide sensitive text under the assumption it never leaves the local machine, when in fact it may be transmitted externally.

Context-Inappropriate Capability

Medium
Confidence
90% confidence
Finding
A network-dependent TTS operation is performed despite the skill's stated local-only purpose, which can violate user expectations, privacy requirements, or offline-environment assumptions. In environments where sensitive workspace content is processed, this can lead to unintended external disclosure of user-provided text.

Vague Triggers

Medium
Confidence
79% confidence
Finding
The 'use when' description is broad enough to match many ordinary text-to-speech or workflow requests, which can cause the agent to select this skill outside its narrow intended context. Over-broad routing increases the chance of unintended file operations, dependency installation guidance, or overwriting canonical outputs when a simpler or safer response would suffice.

Missing User Warnings

Medium
Confidence
95% confidence
Finding
The skill defines a single canonical output file and state pointer that are refreshed on use, but does not prominently warn that generating new audio overwrites the previous canonical artifact. This can lead to data loss, confusion, or accidental replacement of a file that another workflow expects to remain stable.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The code invokes TTS generation without any user-facing warning that the input text may be transmitted to an external service. This lack of disclosure is dangerous because users may unknowingly submit secrets, private messages, or workspace data to a third party while believing the workflow is local.

Missing User Warnings

Medium
Confidence
96% confidence
Finding
The script reads the full contents of the input file and passes that text to a Node-based TTS component, which in common edge-tts workflows relays content to a remote provider. Because there is no user-facing disclosure, consent check, or local-only enforcement, sensitive workspace text could be exfiltrated outside the local environment unexpectedly.

Missing User Warnings

Medium
Confidence
94% confidence
Finding
The script sends the full text content to a network-based TTS backend via edge_tts.js, but it provides no user-facing disclosure, consent step, or indication that local-looking text may leave the machine. In a workspace skill advertised as "local text-to-voice," this is especially misleading because users may reasonably assume sensitive text is processed entirely on-device, creating confidentiality and compliance risks.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal