Security audit

Local TTS

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local text-to-speech skill, but its realistic voice-cloning instructions lack consent and anti-impersonation safeguards.

Install only if you are comfortable with local ML dependencies and model downloads from external repositories. Use voice cloning only for your own voice or voices you have explicit permission to use, and do not use it for impersonation, fraud, harassment, or deceptive content.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (4)

Description-Behavior Mismatch

Medium

Confidence: 96% confidence
Finding: The script markets itself as local and privacy-first, but `from_pretrained(...)` will fetch models from remote repositories when they are not already cached. This can violate user expectations about offline/privacy-preserving operation and introduces supply-chain and metadata-leak risks such as network access, remote model retrieval, and exposure of usage context.

Missing User Warnings

Medium

Confidence: 95% confidence
Finding: The skill explicitly documents voice cloning from reference audio and transcript but provides no warning about consent, impersonation, fraud, or legal/ethical restrictions. In a TTS skill, this context makes the omission more serious because the feature directly enables cloning a real person's voice, which can facilitate social engineering, harassment, or deceptive content generation.

Missing User Warnings

Medium

Confidence: 93% confidence
Finding: The document provides a ready-to-use voice cloning example using reference audio and transcript text without any warning about consent, impersonation risk, or safe handling of biometric voice data. In a privacy/security-focused skill, this omission increases the chance that users will treat voice cloning as routine and may misuse another person's voice or process sensitive audio without authorization.

Missing User Warnings

Low

Confidence: 84% confidence
Finding: The guide repeatedly emphasizes '100% Offline Processing' and 'Zero Data Transmission,' but the model download troubleshooting section omits a clear warning that first-run downloads and manual huggingface-cli downloads contact external services. This can mislead users about network behavior and data exposure during setup, especially in environments that rely on strict offline or air-gapped assumptions.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal