Edge TTS Voice System

Security checks across malware telemetry and agentic risk

Overview

This is a legitimate voice skill, but its privacy/offline claims conflict with hosted Edge TTS behavior and it includes unsafe local execution patterns.

Review carefully before installing. Do not treat this as an offline or fully private voice workflow unless the publisher fixes the documentation or replaces hosted Edge TTS with a local TTS engine. Avoid untrusted audio filenames, check what text may be sent to the TTS provider, and prefer pinned dependencies, safer subprocess handling, explicit install prompts, and cache cleanup controls.

SkillSpector

By NVIDIA
Vulnerability Patterns
  • Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
  • Trigger AbuseOverly Broad Trigger, Shadow Command Trigger, Keyword Baiting Trigger
  • Behavioral ASTexec() Call, eval() Call, Dynamic Import
  • MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
  • Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Findings (12)

subprocess module call

Medium
Category
Dangerous Code Execution
Content
if audio_file.endswith('.ogg'):
                wav_file = tempfile.mktemp(suffix=".wav")
                cmd = f"ffmpeg -i '{audio_file}' -ar 16000 -ac 1 '{wav_file}' -y 2>/dev/null"
                subprocess.run(cmd, shell=True, check=True)
                audio_file = wav_file
            
            # Transcribe with faster-whisper
Confidence
98% confidence
Finding
subprocess.run(cmd, shell=True, check=True)

Tp4

High
Category
MCP Tool Poisoning
Confidence
96% confidence
Finding
The skill advertises a fully local, private, offline voice workflow while the documented behavior indicates outbound TTS uses Edge TTS, which implies network use and contradicts the privacy claims. This can mislead users into sending potentially sensitive voice-derived content to an external service under false assumptions about locality and confidentiality.

Description-Behavior Mismatch

Medium
Confidence
95% confidence
Finding
The README makes a strong privacy/offline claim while the documented installation flow requires network access to fetch packages and model files. This can mislead users into trusting the skill under stricter privacy or air-gapped assumptions than are actually true, increasing supply-chain and privacy risk during setup.

Intent-Code Divergence

Medium
Confidence
96% confidence
Finding
Claiming 'Fully offline - No internet required' is contradicted by later instructions to use pip and wget against remote sources. In a security-sensitive voice skill, this is dangerous because users may deploy it expecting no network dependency or external retrieval, when installation actually introduces remote-fetch and trust-boundary exposure.

Description-Behavior Mismatch

High
Confidence
97% confidence
Finding
Claiming the system works entirely offline and requires no internet is materially deceptive when the outbound TTS path is Edge TTS rather than a local synthesis engine. Users may rely on this claim in sensitive environments and unknowingly expose prompts or generated replies over the network.

Intent-Code Divergence

High
Confidence
96% confidence
Finding
The documentation states that all processing happens locally and is privacy-focused, but it also identifies Edge TTS as the outbound reply mechanism, creating a direct contradiction. Such privacy misrepresentation is dangerous because users may process confidential conversations believing no third party is involved.

Intent-Code Divergence

High
Confidence
96% confidence
Finding
The feature list promises fully offline operation with no internet required, yet later sections document Edge TTS as the default outbound voice path. This mismatch can cause unsafe deployment decisions in air-gapped, regulated, or privacy-sensitive environments where network transmission is prohibited.

Missing User Warnings

Medium
Confidence
88% confidence
Finding
The manual installation steps perform system-modifying actions like sudo package installation, Python package installation, external downloads, file copying, and chmod, but the README does not explicitly warn users about these effects. This omission reduces informed consent and can cause users to run privileged or networked commands without understanding the security implications.

Vague Triggers

Medium
Confidence
81% confidence
Finding
The automatic processing description is broad and lacks clear trigger constraints, which can lead to overly permissive handling of incoming voice messages and unintended autonomous behavior. In a messaging or agent environment, ambiguous auto-activation increases the chance of processing untrusted content without adequate user intent, consent, or policy checks.

Missing User Warnings

Medium
Confidence
89% confidence
Finding
The skill describes silent transcription and automatic voice replies without prominently warning users about the privacy and behavioral consequences. This is risky because voice messages may contain sensitive information, and users may not realize audio is being transcribed and potentially sent through a non-local TTS path automatically.

Missing User Warnings

Medium
Confidence
93% confidence
Finding
The installer automatically runs apt-get update/install when pip3 or ffmpeg are missing, causing privileged package installation and system modification without an explicit consent prompt. In an agent-skill context, installers may be invoked with elevated privileges or by users who expect local setup only, so silent system changes increase the risk of unintended package installation, environment drift, and broader host impact.

Missing User Warnings

Medium
Confidence
97% confidence
Finding
The script installs Python packages from the network and later configures a hosted Edge TTS service, but it does not clearly warn users that installation and runtime may contact external services. For a privacy-focused voice skill, undisclosed network access is especially sensitive because users may assume offline behavior while transcription or synthesis dependencies and hosted TTS can expose metadata or content externally.

VirusTotal

67/67 vendors flagged this skill as clean.

View on VirusTotal