Pocket Tts

Security checks across malware telemetry and agentic risk

Overview

This is a coherent local text-to-speech skill with disclosed package/model downloads and voice cloning features, but users should understand the initial network dependency and consent implications for voice samples.

Install only if you trust the upstream Pocket TTS package and Kyutai/Hugging Face model source. Expect internet access during setup or first run despite the offline-runtime claims, use voice samples only with permission, choose output paths deliberately, and start the local server only when you intend to run it.

SkillSpector

By NVIDIA

Vulnerability Patterns

Data ExfiltrationExternal Transmission, Env Variable Harvesting, File System Enumeration
MCP Tool PoisoningHidden Instructions, Unicode Deception, Parameter Description Injection
Prompt InjectionInstruction Override, Hidden Instructions, Exfiltration Commands
Privilege EscalationExcessive Permissions, Sudo/Root Execution, Credential Access
Supply ChainUnpinned Dependencies, External Script Fetching, Obfuscated Code

Findings (3)

Intent-Code Divergence

Medium

Confidence: 93% confidence
Finding: The documentation states the skill is fully local and requires no internet connection, but later instructions require accepting a Hugging Face license and note that the first run downloads the model. This can mislead users operating in restricted, privacy-sensitive, or air-gapped environments into enabling network access or using the skill under false assumptions about data flow and supply-chain exposure.

Intent-Code Divergence

Medium

Confidence: 92% confidence
Finding: The feature list claims the tool makes no API calls and runs completely offline, while the notes state that the first run downloads the model. This inconsistency is security-relevant because users may rely on the offline claim when making trust, compliance, or deployment decisions, creating risk of unexpected outbound connections and unreviewed dependency retrieval.

Missing User Warnings

Medium

Confidence: 84% confidence
Finding: The skill advertises voice cloning without any guidance on consent, identity misuse, or handling of biometric voice data. In this context, omission of safeguards increases the likelihood of deceptive impersonation, privacy violations, or unsafe deployment by users who are not warned about the ethical and security implications.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal